Agent scoring rubric
Score the experience, then decide the posture.
Use AES, freshness, confidence, blockers, and recipe availability to decide whether an agent should proceed, use guardrails, keep the user present, avoid stale evidence, or collect evidence first.
Primary score
AES
0-100
Dimensions
8
scored per route
Freshness
7/30d
fresh / stale gates
AES bands
crawldex-agent-experience-rubric/v1
Agent execution posture
| Band | AES | Health | Recommendation | Posture | Rule |
|---|---|---|---|---|---|
| Unknown | null | unknown | collect evidence first | no | When AES is null or freshness is unknown, collect evidence before claiming the route is agent-doable. |
| Blocked | 0-34 | blocked | use browser with user present | no | AES below 35, or any blocking friction, means an agent should not attempt the route autonomously. |
| Constrained | 35-59 | constrained | use browser with user present | user present | AES from 35 to 59 means the route may work only with a user present, manual handoff, or tight operator review. |
| Degraded | 60-79 | degraded | proceed with guardrails | with guardrails | AES from 60 to 79, or AES 80+ with nonblocking friction, can proceed only with explicit guardrails and evidence checks. |
| Healthy | 80-100 | healthy | proceed with recipe | with recipe | AES 80+ with no blockers is healthy; autonomous attempts still require fresh evidence and a known-good recipe. |
Evidence age
Freshness gates
Evidence quality
Confidence gates
Dimensions
What agents score
Reachability
The agent can access the task surface without access blocks.
Navigability
The agent can understand and progress through the interface.
Task completability
The task reaches a verified completion signal.
Transactability
Payment, booking, or submission steps can complete safely.
Recoverability
Errors, identity checks, and handoffs are clear and reversible.
Policy parseability
Fees, terms, return rules, and cancellation rules are machine-readable.
Trust and safety
The flow avoids hidden costs, dark patterns, and irreversible ambiguity.
Efficiency
The task completes in fewer steps, less time, and lower agent cost.
Decision order
How to apply the rubric
- Collect evidence first when AES or freshness is unknown.
- Avoid stale evidence when freshness is stale, regardless of AES.
- Require user-present browser use when blockers include auth, 2FA, payment authorization, or forced account creation.
- Proceed with a recipe only when AES is at least 80, evidence is not stale, no blocking friction is present, and a recipe exists.
- Proceed with guardrails when AES is at least 60 but the route lacks a known-good recipe or has nonblocking friction.
- Do not attempt autonomously when AES is below 60 or blocking friction is present.
Reports
Required fields
- site *Report the normalized host or canonical URL for the route attempted.
- task *Use a canonical task key such as subscriptions.cancel or commerce.checkout.
- outcome *Use one of success, success_with_handoff, partial, blocked, failed, or abandoned.
- frictionAttach known friction codes for blockers, handoffs, policy ambiguity, fees, auth, and confirmation gaps.
- stepsCount visible agent actions or browser steps when available.
- duration_secRecord elapsed task time in seconds when available.
- agent_profileInclude stack, model, browser runtime, version, identity class, and capabilities when available.
- evidenceSubmit only redacted evidence IDs, hashes, URIs, and artifact types.
- reporterUse an agent key or signed reporter metadata when available; anonymous reports remain score-neutral until trusted.
- occurred_atUse the actual task attempt time; unauthenticated submissions are normalized by the server.
Evidence safety
Redaction rules
- Never submit passwords, cookies, session tokens, payment data, private screenshots, medical data, financial account data, or unredacted user content.
- Use hashes, redacted trace IDs, public policy URLs, DOM snapshots with private fields removed, and receipt text with personal fields removed.
- Stop before irreversible payments, bookings, cancellations, account changes, or submissions unless the user explicitly authorized that exact action.