CrawlDex
Agent scoring rubric

Score the experience, then decide the posture.

Use AES, freshness, confidence, blockers, and recipe availability to decide whether an agent should proceed, use guardrails, keep the user present, avoid stale evidence, or collect evidence first.

Primary score AES 0-100
Dimensions 8 scored per route
Freshness 7/30d fresh / stale gates
AES bands

Agent execution posture

crawldex-agent-experience-rubric/v1
BandAESHealthRecommendationPostureRule
Unknown null unknown collect evidence first no When AES is null or freshness is unknown, collect evidence before claiming the route is agent-doable.
Blocked 0-34 blocked use browser with user present no AES below 35, or any blocking friction, means an agent should not attempt the route autonomously.
Constrained 35-59 constrained use browser with user present user present AES from 35 to 59 means the route may work only with a user present, manual handoff, or tight operator review.
Degraded 60-79 degraded proceed with guardrails with guardrails AES from 60 to 79, or AES 80+ with nonblocking friction, can proceed only with explicit guardrails and evidence checks.
Healthy 80-100 healthy proceed with recipe with recipe AES 80+ with no blockers is healthy; autonomous attempts still require fresh evidence and a known-good recipe.
Evidence age

Freshness gates

unknownunknown. Unknown freshness is not decision-grade; collect or refresh evidence.collect evidence first
fresh0-7 days. Evidence up to 7 days old is fresh enough for normal preflight decisions.proceed with guardrails
aging8-30 days. Evidence from 8 to 30 days old can support decisions, but agents should verify the current page before irreversible steps.proceed with guardrails
stale31+ days. Evidence older than 30 days should not support autonomous execution until refreshed.avoid until fresh evidence
Evidence quality

Confidence gates

lowLow confidence means evidence is too thin, weakly sourced, or inconsistent; collect more evidence before relying on the score.0-0.39
mediumMedium confidence can support guarded attempts, but the agent should verify blockers and success signals in-browser.0.4-0.74
highHigh confidence means evidence quality and agreement are strong enough for normal preflight decisions.0.75-1
Dimensions

What agents score

reachability

Reachability

The agent can access the task surface without access blocks.

navigability

Navigability

The agent can understand and progress through the interface.

task_completability

Task completability

The task reaches a verified completion signal.

transactability

Transactability

Payment, booking, or submission steps can complete safely.

recoverability

Recoverability

Errors, identity checks, and handoffs are clear and reversible.

policy_parseability

Policy parseability

Fees, terms, return rules, and cancellation rules are machine-readable.

trust_safety

Trust and safety

The flow avoids hidden costs, dark patterns, and irreversible ambiguity.

efficiency

Efficiency

The task completes in fewer steps, less time, and lower agent cost.

Decision order

How to apply the rubric

  1. Collect evidence first when AES or freshness is unknown.
  2. Avoid stale evidence when freshness is stale, regardless of AES.
  3. Require user-present browser use when blockers include auth, 2FA, payment authorization, or forced account creation.
  4. Proceed with a recipe only when AES is at least 80, evidence is not stale, no blocking friction is present, and a recipe exists.
  5. Proceed with guardrails when AES is at least 60 but the route lacks a known-good recipe or has nonblocking friction.
  6. Do not attempt autonomously when AES is below 60 or blocking friction is present.
Reports

Required fields

  • site *Report the normalized host or canonical URL for the route attempted.
  • task *Use a canonical task key such as subscriptions.cancel or commerce.checkout.
  • outcome *Use one of success, success_with_handoff, partial, blocked, failed, or abandoned.
  • frictionAttach known friction codes for blockers, handoffs, policy ambiguity, fees, auth, and confirmation gaps.
  • stepsCount visible agent actions or browser steps when available.
  • duration_secRecord elapsed task time in seconds when available.
  • agent_profileInclude stack, model, browser runtime, version, identity class, and capabilities when available.
  • evidenceSubmit only redacted evidence IDs, hashes, URIs, and artifact types.
  • reporterUse an agent key or signed reporter metadata when available; anonymous reports remain score-neutral until trusted.
  • occurred_atUse the actual task attempt time; unauthenticated submissions are normalized by the server.
Evidence safety

Redaction rules

  • Never submit passwords, cookies, session tokens, payment data, private screenshots, medical data, financial account data, or unredacted user content.
  • Use hashes, redacted trace IDs, public policy URLs, DOM snapshots with private fields removed, and receipt text with personal fields removed.
  • Stop before irreversible payments, bookings, cancellations, account changes, or submissions unless the user explicitly authorized that exact action.