Fallbacks, failures, and observability

How RouterDecision, fallback ordering, no-match outcomes, and observability artifacts fit into one inspectable routing story.

The end product of Router is a RouterDecision.

That artifact is not only "who won." It records the policy snapshot, exclusions, ranked candidates, chosen endpoint, and fallback order.

Fallbacks are part of the same ranked result

Fallbacks are not a separate speculative pass.

They are simply the remaining eligible candidates after the same deterministic ranking that produced the winner.

That means a valid decision contains:

one chosen endpoint
zero or more fallback endpoints already in deterministic order

What a healthy decision tells you

A good decision record lets you answer:

what policy was active
which candidates were rejected
which candidates were scored
why the winner beat the others
whether measured evidence, benchmark evidence, or declared-only evidence carried the result

How strategy shows up in a decision

The decision artifact should make strategy visible rather than implicit.

When reading a decision, confirm:

the saved strategy appears in policy_snapshot
the winner matches the kind of tradeoff that strategy should prefer
the fallback order also reflects the same strategy rather than a separate hidden rule

If you need the mode-by-mode explanation first, read /router/strategy-modes-and-tradeoffs.

Success, no-match, and degraded-evidence outcomes

Routing can succeed, fail cleanly, or succeed with thin evidence.

Successful selection

Success means:

at least one candidate was eligible
at least one candidate was scored
chosen_endpoint_id is populated
selection_reasons explain the winning choice

If more than one candidate was eligible, the decision also carries a fallback chain.

No eligible endpoint

When nothing is eligible, Router should still return a useful artifact:

all candidates appear in eligibility
each rejected candidate carries exclusion details
no scored candidates
no chosen endpoint
no fallback list

This is not an exception-shaped outcome. It is still a valid protocol artifact.

Policy conflict

Policy conflict usually appears as "all candidates rejected by policy," for example:

every endpoint is denied by allow/deny rules
remote routing is forbidden but no local candidate exists
role-forbidden capabilities eliminate the remaining candidates

The important point is that the protocol communicates this through candidate-level exclusions rather than through one special top-level error code.

Degraded selection with defaults

A router can still select an endpoint without rich observed performance evidence. In that case the decision will typically include default-driven scoring signals rather than a hard failure.

Insufficient evidence is usually represented as a weaker, more default-driven decision rather than as a rejection.

Observability extends the decision story

RouterDecision is the summary artifact, but it is not the only one.

RouterDecision

Summary of the policy, eligibility, ranking, winner, fallbacks, and reasons.

TraceSpan / TraceEvent

Timing and phase-level execution detail.

UsageEvent

Outcome and accounting for the request.

Profile sample

Recorded benchmark or live-request sample.

ObservedPerformanceProfile

Aggregated freshness-weighted, confidence-scored evidence for later routing.

Routing stays inspectable because the decision, trace, usage, and profile-update layers remain linked by shared IDs.

Decision as summary

RouterDecision explains:

policy
eligibility
ranking
winner
fallbacks
reasons

Traces as phase-level explanation

Trace spans and events explain how the request moved through routing and execution:

when eligibility opened and closed
when scoring ran
whether fallback or retry happened
where provider-side latency accumulated

Usage as outcome and accounting

UsageEvent answers the practical outcome questions:

which endpoint actually served the request
how many tokens were consumed
how long it took
what it likely cost
whether an error class occurred

The observability model is not write-only. Usage and benchmark samples become new measured evidence, and the profile aggregator folds them into updated observed profiles with freshness and confidence scores.

That is how routing becomes a feedback system rather than a static policy engine.

Inspect in product, then in reference

Use product surfaces first:

Router -> Decisions
Router -> Decision detail
Observe -> request and telemetry detail

Then use deeper reference when needed: