How Treebeardβ’ Works
Radical transparency is a feature, not a risk. Here is exactly how our system works β what it does, what it doesn't do, and why.
The Rating Pipeline
Every Treebeard rating follows four steps: discover the agent on-chain, collect verifiable signals, run the scoring formula, and publish the result. The entire process is automated and deterministic.
On-Chain Discovery
Treebeard crawls the ERC-8004 Identity Registry across four chains β Ethereum, Base, Arbitrum, and Avalanche β to discover registered AI agents. Every agent with an on-chain identity gets indexed automatically. No application required.
- Reads directly from ERC-8004 registry smart contracts via RPC
- Indexes agent ID, registration date, chain, and agentURI metadata
- Currently tracking 32,000+ registered agents across 4 chains
Signal Collection
For each discovered agent, Treebeard collects verifiable signals from public on-chain data. These signals form the raw inputs to the rating formula. All data sources are public and auditable.
- On-chain age: days since ERC-8004 registration transaction
- Reputation feedback: count and sentiment from the ERC-8004 Reputation Registry (via The Graph)
- Chain presence: which networks the agent operates on
- Additional signals (TVL, code activity, transaction volume) collected when available
Automated Scoring
A deterministic scoring engine processes the collected signals into a composite rating. The formula is open and documented β the same inputs always produce the same score. No manual adjustments, no pay-to-play.
- Six category scores: Economic Viability, Operational Reliability, Code Quality, Autonomy Index, Safety, Community
- Weighted composite produces a 0β100 numeric score and A+ through F letter grade
- Safety floor: agents with critical safety concerns cannot score above a threshold regardless of other signals
- Hysteresis buffer prevents grade oscillation from small score fluctuations
Publication
Ratings are published to the Treebeard website and API. Every rating includes its numeric score, letter grade, six category breakdowns, confidence level, and the algorithm version that produced it.
- Full rating breakdown available on every agent profile page
- Public REST API at /v1/agents for programmatic access
- Algorithm version tracked for reproducibility
- Methodology page documents every weight and threshold
Design Principles
Every design decision in Treebeard serves a specific purpose.
Deterministic Scoring
The scoring engine is a pure function β the same inputs always produce the same score. No manual overrides, no subjective adjustments, no hidden factors. If you have the inputs and the formula, you can reproduce any rating.
Public Methodology
Every weight, threshold, normalization curve, and edge case is documented in the methodology page. We don't rely on 'trust us' β we show the math.
Cost-to-Fake Weighting
Signals that are expensive to fake (on-chain history, verified transactions) carry more weight than signals that are cheap to fake (social followers, self-reported metrics).
On-Chain First
The primary data source is the ERC-8004 Identity Registry β a public, permissionless, on-chain record. Agents don't need to apply or self-report. If you're registered, you're indexed.
Safety Floor
Agents with critical safety concerns cannot score above a defined threshold, regardless of how strong their other signals are. Safety is non-negotiable.
Hysteresis Buffer
Small score fluctuations don't change the letter grade. A 0.1-point wobble shouldn't move an agent from A- to B+. The buffer prevents grade oscillation between rating epochs.
Oversight & Quality Control
Treebeard is an early-stage system built and operated by a small founding team. Here is how quality control works today.
Founder Review
The founding team reviews rating outputs, monitors for anomalies, and validates methodology changes before deployment. As the system matures, we plan to formalize this into structured review processes with clear escalation paths.
Open Methodology
The full scoring methodology β weights, thresholds, normalization curves β is published on the methodology page. Anyone can audit the formula and verify how a specific rating was calculated.
Feedback Welcome
If you believe a rating is inaccurate or unfair, we want to hear about it. Reach out at hello@treebeardai.com. We take every report seriously and will investigate.
Known Limitations
Treebeard is early-stage software. We believe in being upfront about what we can and can't do today.
- We don't have a team of human analysts reviewing every rating β scoring is automated and deterministic.
- We can't yet detect all forms of gaming or manipulation β our signal coverage is growing but incomplete.
- Most agents currently have limited signal diversity, which means many scores cluster in the same range.
- Re-rating cadence is still being tuned β scores may not reflect recent changes immediately.