How Treebeardβ„’ Works

Radical transparency is a feature, not a risk. Here is exactly how our system works β€” what it does, what it doesn't do, and why.

The Rating Pipeline

Every Treebeard rating follows four steps: discover the agent on-chain, collect verifiable signals, run the scoring formula, and publish the result. The entire process is automated and deterministic.

πŸ”—
Step 1

On-Chain Discovery

Treebeard crawls the ERC-8004 Identity Registry across four chains β€” Ethereum, Base, Arbitrum, and Avalanche β€” to discover registered AI agents. Every agent with an on-chain identity gets indexed automatically. No application required.

  • Reads directly from ERC-8004 registry smart contracts via RPC
  • Indexes agent ID, registration date, chain, and agentURI metadata
  • Currently tracking 32,000+ registered agents across 4 chains
πŸ“Š
Step 2

Signal Collection

For each discovered agent, Treebeard collects verifiable signals from public on-chain data. These signals form the raw inputs to the rating formula. All data sources are public and auditable.

  • On-chain age: days since ERC-8004 registration transaction
  • Reputation feedback: count and sentiment from the ERC-8004 Reputation Registry (via The Graph)
  • Chain presence: which networks the agent operates on
  • Additional signals (TVL, code activity, transaction volume) collected when available
⚑
Step 3

Automated Scoring

A deterministic scoring engine processes the collected signals into a composite rating. The formula is open and documented β€” the same inputs always produce the same score. No manual adjustments, no pay-to-play.

  • Six category scores: Economic Viability, Operational Reliability, Code Quality, Autonomy Index, Safety, Community
  • Weighted composite produces a 0–100 numeric score and A+ through F letter grade
  • Safety floor: agents with critical safety concerns cannot score above a threshold regardless of other signals
  • Hysteresis buffer prevents grade oscillation from small score fluctuations
πŸ“‘
Step 4

Publication

Ratings are published to the Treebeard website and API. Every rating includes its numeric score, letter grade, six category breakdowns, confidence level, and the algorithm version that produced it.

  • Full rating breakdown available on every agent profile page
  • Public REST API at /v1/agents for programmatic access
  • Algorithm version tracked for reproducibility
  • Methodology page documents every weight and threshold

Design Principles

Every design decision in Treebeard serves a specific purpose.

Deterministic Scoring

The scoring engine is a pure function β€” the same inputs always produce the same score. No manual overrides, no subjective adjustments, no hidden factors. If you have the inputs and the formula, you can reproduce any rating.

Public Methodology

Every weight, threshold, normalization curve, and edge case is documented in the methodology page. We don't rely on 'trust us' β€” we show the math.

Cost-to-Fake Weighting

Signals that are expensive to fake (on-chain history, verified transactions) carry more weight than signals that are cheap to fake (social followers, self-reported metrics).

On-Chain First

The primary data source is the ERC-8004 Identity Registry β€” a public, permissionless, on-chain record. Agents don't need to apply or self-report. If you're registered, you're indexed.

Safety Floor

Agents with critical safety concerns cannot score above a defined threshold, regardless of how strong their other signals are. Safety is non-negotiable.

Hysteresis Buffer

Small score fluctuations don't change the letter grade. A 0.1-point wobble shouldn't move an agent from A- to B+. The buffer prevents grade oscillation between rating epochs.

Oversight & Quality Control

Treebeard is an early-stage system built and operated by a small founding team. Here is how quality control works today.

Founder Review

The founding team reviews rating outputs, monitors for anomalies, and validates methodology changes before deployment. As the system matures, we plan to formalize this into structured review processes with clear escalation paths.

Open Methodology

The full scoring methodology β€” weights, thresholds, normalization curves β€” is published on the methodology page. Anyone can audit the formula and verify how a specific rating was calculated.

Feedback Welcome

If you believe a rating is inaccurate or unfair, we want to hear about it. Reach out at hello@treebeardai.com. We take every report seriously and will investigate.

Known Limitations

Treebeard is early-stage software. We believe in being upfront about what we can and can't do today.

  • We don't have a team of human analysts reviewing every rating β€” scoring is automated and deterministic.
  • We can't yet detect all forms of gaming or manipulation β€” our signal coverage is growing but incomplete.
  • Most agents currently have limited signal diversity, which means many scores cluster in the same range.
  • Re-rating cadence is still being tuned β€” scores may not reflect recent changes immediately.