{"schema_version":"crawldex-agent-experience-rubric/v1","title":"CrawlDex Agent Experience Scoring Rubric","version":"0.1.0","updated_at":"2026-06-09T00:00:00.000Z","summary":"Use AES, freshness, confidence, blockers, and recipe availability to decide whether an agent should proceed, use guardrails, keep the user present, avoid stale evidence, or collect evidence first.","primary_score":{"key":"aes","label":"Agent Experience Score","scale":"0-100","unknown_value":null,"definition":"AES is the task-level agent execution score derived from weighted outcome evidence. Higher scores mean a web agent is more likely to complete the task with clear success signals and low friction."},"score_bands":[{"key":"unknown","label":"Unknown","aes_min":null,"aes_max":null,"health":"unknown","default_recommendation":"collect_evidence_first","risk_level":"unknown","autonomous_attempt":"no","rule":"When AES is null or freshness is unknown, collect evidence before claiming the route is agent-doable."},{"key":"blocked","label":"Blocked","aes_min":0,"aes_max":34,"health":"blocked","default_recommendation":"use_browser_with_user_present","risk_level":"high","autonomous_attempt":"no","rule":"AES below 35, or any blocking friction, means an agent should not attempt the route autonomously."},{"key":"constrained","label":"Constrained","aes_min":35,"aes_max":59,"health":"constrained","default_recommendation":"use_browser_with_user_present","risk_level":"medium","autonomous_attempt":"user_present","rule":"AES from 35 to 59 means the route may work only with a user present, manual handoff, or tight operator review."},{"key":"degraded","label":"Degraded","aes_min":60,"aes_max":79,"health":"degraded","default_recommendation":"proceed_with_guardrails","risk_level":"low","autonomous_attempt":"with_guardrails","rule":"AES from 60 to 79, or AES 80+ with nonblocking friction, can proceed only with explicit guardrails and evidence checks."},{"key":"healthy","label":"Healthy","aes_min":80,"aes_max":100,"health":"healthy","default_recommendation":"proceed_with_recipe","risk_level":"low","autonomous_attempt":"with_recipe","rule":"AES 80+ with no blockers is healthy; autonomous attempts still require fresh evidence and a known-good recipe."}],"freshness_rules":[{"status":"unknown","age_days_min":null,"age_days_max":null,"default_recommendation":"collect_evidence_first","rule":"Unknown freshness is not decision-grade; collect or refresh evidence."},{"status":"fresh","age_days_min":0,"age_days_max":7,"default_recommendation":"proceed_with_guardrails","rule":"Evidence up to 7 days old is fresh enough for normal preflight decisions."},{"status":"aging","age_days_min":8,"age_days_max":30,"default_recommendation":"proceed_with_guardrails","rule":"Evidence from 8 to 30 days old can support decisions, but agents should verify the current page before irreversible steps."},{"status":"stale","age_days_min":31,"age_days_max":null,"default_recommendation":"avoid_until_fresh_evidence","rule":"Evidence older than 30 days should not support autonomous execution until refreshed."}],"confidence_rules":[{"level":"low","score_min":0,"score_max":0.39,"rule":"Low confidence means evidence is too thin, weakly sourced, or inconsistent; collect more evidence before relying on the score."},{"level":"medium","score_min":0.4,"score_max":0.74,"rule":"Medium confidence can support guarded attempts, but the agent should verify blockers and success signals in-browser."},{"level":"high","score_min":0.75,"score_max":1,"rule":"High confidence means evidence quality and agreement are strong enough for normal preflight decisions."}],"score_dimensions":[{"key":"reachability","title":"Reachability","description":"The agent can access the task surface without access blocks."},{"key":"navigability","title":"Navigability","description":"The agent can understand and progress through the interface."},{"key":"task_completability","title":"Task completability","description":"The task reaches a verified completion signal."},{"key":"transactability","title":"Transactability","description":"Payment, booking, or submission steps can complete safely."},{"key":"recoverability","title":"Recoverability","description":"Errors, identity checks, and handoffs are clear and reversible."},{"key":"policy_parseability","title":"Policy parseability","description":"Fees, terms, return rules, and cancellation rules are machine-readable."},{"key":"trust_safety","title":"Trust and safety","description":"The flow avoids hidden costs, dark patterns, and irreversible ambiguity."},{"key":"efficiency","title":"Efficiency","description":"The task completes in fewer steps, less time, and lower agent cost."}],"dimension_penalties":[{"dimension":"reachability","blocker_codes":["bot_blocked","captcha","login_required"],"penalty_per_blocker":12,"rule":"Access blockers reduce the ability to reach the task surface."},{"dimension":"navigability","blocker_codes":["forced_account_creation","dark_pattern_cancel","support_dead_end"],"penalty_per_blocker":12,"rule":"Confusing account gates, retention flows, or dead ends reduce navigation quality."},{"dimension":"task_completability","blocker_codes":["human_handoff_required","captcha","bot_blocked","policy_unparseable"],"penalty_per_blocker":12,"rule":"Completion blockers reduce confidence that the task can reach a verified end state."},{"dimension":"transactability","blocker_codes":["payment_3ds_user_present","price_mismatch","hidden_fee"],"penalty_per_blocker":12,"rule":"Payment, fee, and amount ambiguity reduce safe transaction quality."},{"dimension":"recoverability","blocker_codes":["human_handoff_required","no_confirmation_email","support_dead_end"],"penalty_per_blocker":12,"rule":"Weak confirmation and support paths reduce recovery quality."},{"dimension":"policy_parseability","blocker_codes":["policy_unparseable","refund_policy_unclear","return_policy_unclear","terms_unparseable"],"penalty_per_blocker":12,"rule":"Unclear policies reduce an agent's ability to cite and apply rules safely."},{"dimension":"trust_safety","blocker_codes":["price_mismatch","hidden_fee","payment_3ds_user_present","bot_blocked"],"penalty_per_blocker":12,"rule":"Hidden costs, payment ambiguity, and hostile blocking reduce trust and safety."},{"dimension":"efficiency","blocker_codes":["captcha","2fa_user_present","human_handoff_required","forced_account_creation"],"penalty_per_blocker":12,"rule":"High-step or user-present gates increase agent cost and lower efficiency."}],"outcome_scores":{"success":1,"success_with_handoff":0.8,"partial":0.4,"blocked":0,"failed":0,"abandoned":0},"source_tier_weights":{"seeded_example":0.15,"public_web_observation":0.35,"anonymous_report":0.2,"merchant_report":0.45,"attested_sdk":0.7,"synthetic_canary":1},"source_tier_caps":{"seeded_example":6,"public_web_observation":8,"anonymous_report":1,"merchant_report":4,"attested_sdk":12,"synthetic_canary":20},"blocking_friction_codes":["captcha","bot_blocked","price_mismatch"],"user_presence_friction_codes":["2fa_user_present","payment_3ds_user_present","login_required","forced_account_creation"],"reporting_fields":[{"field":"site","required":true,"rule":"Report the normalized host or canonical URL for the route attempted."},{"field":"task","required":true,"rule":"Use a canonical task key such as subscriptions.cancel or commerce.checkout."},{"field":"outcome","required":true,"rule":"Use one of success, success_with_handoff, partial, blocked, failed, or abandoned."},{"field":"friction","required":false,"rule":"Attach known friction codes for blockers, handoffs, policy ambiguity, fees, auth, and confirmation gaps."},{"field":"steps","required":false,"rule":"Count visible agent actions or browser steps when available."},{"field":"duration_sec","required":false,"rule":"Record elapsed task time in seconds when available."},{"field":"agent_profile","required":false,"rule":"Include stack, model, browser runtime, version, identity class, and capabilities when available."},{"field":"evidence","required":false,"rule":"Submit only redacted evidence IDs, hashes, URIs, and artifact types."},{"field":"reporter","required":false,"rule":"Use an agent key or signed reporter metadata when available; anonymous reports remain score-neutral until trusted."},{"field":"occurred_at","required":false,"rule":"Use the actual task attempt time; unauthenticated submissions are normalized by the server."}],"decision_order":["Collect evidence first when AES or freshness is unknown.","Avoid stale evidence when freshness is stale, regardless of AES.","Require user-present browser use when blockers include auth, 2FA, payment authorization, or forced account creation.","Proceed with a recipe only when AES is at least 80, evidence is not stale, no blocking friction is present, and a recipe exists.","Proceed with guardrails when AES is at least 60 but the route lacks a known-good recipe or has nonblocking friction.","Do not attempt autonomously when AES is below 60 or blocking friction is present."],"redaction_rules":["Never submit passwords, cookies, session tokens, payment data, private screenshots, medical data, financial account data, or unredacted user content.","Use hashes, redacted trace IDs, public policy URLs, DOM snapshots with private fields removed, and receipt text with personal fields removed.","Stop before irreversible payments, bookings, cancellations, account changes, or submissions unless the user explicitly authorized that exact action."]}