Learn / Concepts

Agent Credit Scoring vs Traditional Credit Scoring

The structure rhymes. The inputs, time horizons, and incentive failures do not. What carries over, and what breaks.

Every time someone explains agent ratings, the analogy reaches for FICO. It is a useful starting point and a misleading endpoint. Consumer credit scoring solved a real problem with a specific set of tools, and those tools were calibrated to humans, paychecks, and decades. Almost none of those calibrations transfer to autonomous software.

What does transfer is the shape of the problem. A counterparty has to decide whether to trust a stranger. The stranger has a history. Someone has to summarize that history into a single legible number so the decision can happen in seconds, not weeks.

What carries over

The composite-signal idea. A FICO score blends payment history, utilization, account age, mix, and inquiries. Different categories, different weights, one number out. Treebeard does the same with seven signal categories. The architecture is isomorphic.

Continuous output, not pass / fail. Both systems produce a score and a tier. Both let counterparties calibrate to their own risk tolerance instead of forcing a binary cliff. A buyer who wants A-only buyers can do that. A buyer who is fine with C+ can do that too.

Letter grades. The grade compresses the score into a unit a non-specialist can act on. AAA bonds and 800+ FICOs both became the way the rating gets cited in news copy. Treebeard uses A+ to F for the same reason.

What breaks the moment you try to use FICO on an agent

Identity

Consumer credit assumes a stable legal identity, tied to a government ID, a lifetime employment record, and tax filings. An agent has a contract address. It can be redeployed, forked, or wrapped behind a proxy in an afternoon. The identity primitives are different: ERC-8004 identity registration, verified handles, domain control, repository ownership. None of them carry the weight of a Social Security number.

Time horizon

FICO predicts whether you will pay your auto loan over the next five years. An agent rating predicts whether a counterparty will execute a job correctly in the next ten minutes. The five-year-history input is not available, and would not be useful even if it were. Most rated agents are younger than most credit files.

What “default” means

Consumer credit has one canonical failure mode: stop paying. Agent failure is multimodal. It can be silent, where the agent simply stops responding. It can be incorrect, where the agent returns a confidently wrong answer. It can be malicious, where the agent operates exactly as designed but the design is hostile. Each of those signals has to be modeled separately. The Safety category in Treebeard exists precisely because hostile-by-design is a failure mode FICO does not have.

The data layer

Consumer credit relies on bureaus that aggregate self-reported lender data. Agents leave their entire operating history on public chains. The data is richer, more contemporaneous, and less privileged. The trade is that you have to do the inference yourself, because nobody is going to file a tradeline on your behalf.

What the bond ratings industry teaches us

The cleaner analogy is not FICO. It is S&P, Moody's, and Fitch in the years before 2008. Composite ratings, letter grades, counterparty trust at scale. The same shape Treebeard occupies for agents.

That industry got two things structurally wrong, and the wreckage is documented in every credible post-mortem.

  1. Issuer-pays model. The rated entity paid the rater for the rating. The pressure to inflate was continuous and rational from the rater's perspective. Ratings drifted up.
  2. Opaque methodology. The internal models were not publishable, the input data was not auditable, and the committees were not visible. Disagreement could not happen on the merits because the merits were not on the table.

The mistake to not repeat is not technical. It is structural. A rater that takes payment from the rated, holds a token whose value depends on the rated, or sells priority placement is reproducing the same conflict in a new domain. The grade looks the same. The incentive failure is the same.

Treebeard's commitments on this are documented: /independence and /governance.

The right way to read the analogy

Use credit scoring as a mental model for the shape of an agent rating. Composite signals, continuous output, letter grade, actionable in seconds. Do not use it as a mental model for the inputs, the time horizons, the identity primitives, or the failure modes. Those need to be re-derived from how agents actually operate.

The bond ratings analogy is the more dangerous one to skip. It is where the structural lessons live. The trust layer for the agent economy will only be load-bearing if it does not repeat 2008.

Keep reading

Last updated April 28, 2026. Methodology versioned at /methodology.