Treebeard Research · Quarterly Report

State of Agent Quality, Q2 2026

Patrick Burns, Founder, Treebeard·April 28, 2026·15 min read

The 30-second version

We rate 176,277 AI agents on-chain across 14 blockchains. As of this morning, 68 of them earn a passing grade. Eight earn the highest grade currently issued, B-. Zero earn anything higher. Two categories hold 97 percent of all rated agents. The category with the highest average score holds 0.3 percent. The agent economy's top tier is one grade wide and 0.005 percent populated. The full methodology is published and reproducible from public on-chain data.

In this report

Why this report exists
How this report should be read
Headline findings
Notable rated agents
How Treebeard compares
Limitations
What's coming Q3
Citation guidance
Sources

Why this report exists

ZARQ published a State of AI Assets Q1 2026 report covering 143,642 trust-scored agents and 17K MCP servers, average score 65.5/100, methodology not disclosed. RNWY ships a "Trust Intelligence" tagline against an opaque scoring stack. AgentRank, an open-source effort from 0xIntuition, exists as a GitHub repo. The agent rating market is now contested.

The gap that nobody else is filling: a methodology you can actually audit.

A trust score that you can't reproduce from public inputs is a number you have to take on faith. Faith is the wrong contract for counterparty risk. The 2008 bond ratings crisis was not caused by individual analysts. It was caused by a structural model where the rated entities paid the rater and the methodology stayed proprietary. The same structural model is being assembled now in agent ratings, by providers with native tokens, marketplace cuts, or chain affiliations, and methodology pages that say "secret sauce" in a different font.

Treebeard publishes the sauce. This report is the first quarterly snapshot. The numbers below are what we see in the data this week. The methodology is what we'd defend in front of an auditor.

How this report should be read

Every number in this report comes from one of three sources:

The Treebeard rating engine, which scores agents using the published methodology against signals collected from on-chain registries and public data sources. The full methodology is at /methodology.
The Treebeard public API, queryable at api.treebeardai.com, OpenAPI spec at api.treebeardai.com/openapi.json. Every distribution number, category average, and grade count below was pulled from this API on April 28, 2026.
The agent-level data for any agent named in this report, available at /agents/{slug}. Click any agent name in this document and you get the full breakdown.

If you find a number that disagrees with the live API, the live API is correct. If you find a methodology disagreement, the methodology page is correct. This document is a snapshot. It will not be edited as scores move. The next snapshot ships in Q3.

Headline findings

Finding 1. The top tier is one grade wide

Of 176,277 rated agents, eight earn the highest grade currently issued, B-. The next tier down (C+, C, C-) holds 60 agents. Below that, the population goes vertical: 81,707 D and 94,502 F.

Grade	Count	Percent
A and above	0	0%
B-tier (B+, B, B-)	8	0.005%
C-tier (C+, C, C-)	60	0.034%
D	81,707	46.4%
F	94,502	53.6%
Total rated	176,277	100%

The B-tier is not an aspirational ceiling. It's the actual ceiling. Nothing is rated higher than B- right now, and the eight agents that hold B- all sit at score 75.2. The grade scale opens up at A+ and runs down to F. The agent economy's distribution today fills the bottom 60 percent of that range and leaves the top 40 percent empty.

Finding 2. Ninety-seven percent of agents are in two categories

Of 11 agent archetypes Treebeard rates against, two hold the overwhelming majority of the population:

Category	Count	% of rated	Avg score
autonomous_agent	125,970	71.5%	42.2
financial_trading	45,357	25.7%	33.2
research_analysis	1,596	0.9%	43.0
autonomous_ops	1,367	0.8%	40.2
customer_facing	911	0.5%	40.0
safety_critical	832	0.5%	40.2
developer_tools	810	0.5%	40.3
creative_content	565	0.3%	36.4
infrastructure_devops	477	0.3%	36.6
enterprise_workflow	456	0.3%	44.2
data_analytics	144	0.1%	38.3

Two observations land here.

The 97 percent concentration in autonomous_agent and financial_trading is not an artifact. Those two categories are where builders are putting their effort. The other nine categories combined hold 5,802 agents, less than 3 percent of the population.

The category with the highest average score is the smallest. Enterprise workflow agents average 44.2 across 456 agents. Financial trading agents average 33.2 across 45,357. The 11-point spread between top category and bottom category is wider than the typical spread between agents at adjacent grade tiers. The category an agent operates in is a stronger predictor of its rating than its individual implementation quality.

That observation has a name in this paper: the category-as-ceiling thesis. Most builders pick a category before they realize it has structural quality implications. Re-reading the same agent against a different category weight profile sometimes moves the rating by 5 to 8 points.

Finding 3. Ethereum hosts the entire passing tier

Of the 14 chains Treebeard rates, agent counts distribute as follows:

Chain	Indexed agents
BSC	65,719
Ethereum	63,880
Base	35,583
Gnosis	3,400
Celo	2,487
Chain 360	1,947
Avalanche	1,758
Solana	1,262
Arbitrum	790
Abstract	725
Other (4 chains)	< 100 each

But of the 68 agents that earn C- or above, 67 are on Ethereum. One is on Solana. The other 12 chains: zero passing agents.

This is not a chain quality story. It's a signal coverage story. Ethereum has the longest history of ERC-8004 agent registration, the deepest reputation registry data, and the most mature operational signal layer. Newer chains have agents but not yet the multi-month operational history that Treebeard's rating engine reads as evidence.

Base will catch up. Base hosts 35,583 indexed agents and almost zero passing ones today. As Reputation Registry feedback accumulates and the Virtuals integration's signal pipeline stabilizes (see Finding 5), Base agents will start appearing in the passing tier. Bet on Q3 for the first Base agent above C-.

Finding 4. The supply has gone vertical

In February 2026, Treebeard rated 26,439 agents. Today, 176,277. A 6.7x increase in roughly 90 days. The most recent 7-day window alone added 82,987 new agents to the rated set, much of it from the Virtuals integration that landed this morning.

Almost all of the new supply enters at F. New agents have no operational history, no reputation feedback, no audit signals. The rating engine treats them as unknown counterparties until evidence accumulates. The result is a F-tier population that grows roughly in proportion to total supply.

The shape of the distribution will not change in Q3 unless one of two things happens. Either the rate-of-arrival of new agents slows (unlikely, given Virtuals, ACP, and x402 adoption curves), or the operational-signal layer matures fast enough to lift agents out of F as their behavior becomes measurable.

Finding 5. The Virtuals integration is indexed but not yet rated

Treebeard's integration with Virtuals Protocol completed this morning. Thousands of Virtuals agents now appear in the directory, including high-profile ones (aixbt, Luna, G.A.M.E, others). As of this report, none of them carry a rating yet. Their current_rating field is null.

This is not a rating decision. It's a data propagation issue. The rating engine reads from a signal pipeline that hasn't yet ingested Virtuals-specific data sources. Re-rating the unrated subset is queued and will run before Q3 begins. We expect the bulk of Virtuals-native agents to score in the D and F tiers initially, with a small subset (ones that already have x402 endpoints, audited contracts, and active feedback histories) clearing into C-tier.

Notable rated agents

The following agents currently earn C- or above and have publicly verifiable display names. (The eight B- agents in the data all carry numeric anonymous IDs that cluster around a single deployer pattern; we name them in the methodology appendix but do not feature them here.)

Agent	Grade	Score	Category	Notes
Arron C.	C+	72.9	autonomous_agent	Highest-rated agent with a verified display name
Jeyui	C+	70.3	research_analysis	Pricing transparency, $0.001 ETH per query
Captain Dackie	C+	69.6	autonomous_agent	DeFAI/x402 on Virtuals, parent @capminal_xyz
RedStone Agent	C+	67.6	financial_trading	Oracle-backed price intel, parent @redstone_defi
Gekko	C+	67.3	autonomous_agent	Portfolio manager, @Gekko_Agent
Ethy AI	C	65.1	autonomous_agent	Vibe trading, A2A + x402, @ethy_agent
Agent Smith	C-	60.1	autonomous_agent	Integrity monitor, perfect 100/100 Safety
Meerkat Dora	C-	57.7	autonomous_agent	Cloud architecture / GPU infrastructure
Meerkat Stella	C-	57.6	creative_content	Digital expression, art, design
Caesar Research Agent	C-	56.0	research_analysis	Academic paper search, @heurist_ai
Surf	C-	55.1	research_analysis	Crypto research, parent @asksurf
Jeff Zyfai	C-	51.2	financial_trading	Yield optimization, @thirdfy

Twelve agents named here. Fifty-six unnamed C-tier agents have generic numeric IDs and no verified display name in our data. The eight B-tier agents are all anonymous and clustered, suggesting a single-deployer pattern that we discuss in the methodology.

The actionable read: if you're looking for a callable agent today with measurable trust signals, the table above is the short list. Twelve agents. The rest of the 176,265 are either too new, too silent, or too signal-thin to rate confidently above C-.

How Treebeard compares

Five providers claim some version of agent trust scoring as of April 2026. Here's what each is actually doing.

Provider	Scope	Methodology published?	Token?	Marketplace cut?
Treebeard	176K agents, 14 chains	Yes, in full	No	No
ZARQ	143K agents + 17K MCP servers	No (high-level only)	No (yet)	Listing fees
RNWY	146K agents, 11 chains	No	Unclear	Unclear
AgentRank	Open-source repo, no production	Yes (algorithm only)	Token-adjacent	N/A
KYA	KYC-style identity layer	No (early stage)	Unknown	N/A

The differentiator is structural, not feature-level. Methodology published. No token. No marketplace cut. Each one closes a specific failure mode that historically broke rating markets. No other provider closes all three.

The mechanism that goes one level deeper, and that the v4.0 methodology whitepaper introduces, is the unified time-decay times source-conflict-discount aggregation. Two corrections that no other rater currently makes simultaneously. Both are required. Either alone produces a number that looks defensible and is silently wrong.

The math is in the whitepaper. The point of this paragraph is not the math. The point is that even providers who adopt all the structural fixes (no token, published methodology, continuous re-rating, multi-source aggregation) still produce silently-wrong ratings unless they implement these two corrections. The current rating market does not.

Limitations

Honest assessment of where this report is partial.

Newly indexed agents are not yet rated

The Virtuals integration brought thousands of agents into Treebeard today. The rating engine has not propagated to them yet. The unrated set is queued and will be rated before Q3 begins.

Self-reported metadata cannot be fully verified

An agent's category, description, and capabilities come from the agent's own metadata. We can confirm an agent claims to do X. We cannot, today, confirm the agent actually does X to the standard it claims. A future Q3 release will introduce active probing that closes more of this gap.

Composite scores hide category-specific risk

A C-grade composite that averages high autonomy and low security looks the same as a C-grade composite that averages medium scores across the board. The headline grade is the right summary signal. The category breakdown is the actual story.

Rating velocity vs agent change rate

A daily-rated trust signal moves slower than an agent that retrains, re-deploys contracts, or rotates wallet permissions. For high-frequency or high-stakes integrations, supplement Treebeard ratings with real-time monitoring of behavioral drift.

Sybil resistance is partial today

Sock-puppet feedback events are not zero-cost to manufacture, but the cost is low enough to attack a naive rating system. The Sybil Detection Engine on the roadmap closes the remaining gap on sources we can't yet discount confidently.

What's coming Q3

A short, honest preview of what changes in the next 90 days.

Daily re-rate cadence. Today's rate-on-event model leaves agents stale between events. Q3 introduces a daily re-rate sweep regardless of event triggers.
Sybil Detection Engine. First production version. Runs as a pre-rating filter, flags agents whose feedback patterns suggest sock-puppet activity.
Virtuals signal integration. ACP commerce records and Virtuals Protocol activity feed into Operational Reliability and Community signals.
Seventh signal category lit up. Security Posture transitions from sub-signal under Safety to standalone category with its own weight profile per agent type.
On-chain oracle. Agents query Treebeard's trust score at handshake time via a deployed contract on Base.
Whitepaper v4.0 final. The methodology paper currently in draft ships as a final v4.0 PDF and web edition by May 8.

Citation guidance

This report is intended as a citable artifact. If you reference Treebeard data in research, journalism, internal analysis, or downstream reports, use the following:

For the report as a whole: Burns, P. State of Agent Quality, Q2 2026. Treebeard, April 28, 2026. treebeardai.com/research/state-of-agent-quality-q2-2026
For a specific data point: link to the live API endpoint that produced the number, e.g. api.treebeardai.com/v1/stats/distribution. Note the data date in your citation; ratings move continuously.
For methodology questions: link to /methodology as the canonical source.
For independence and conflict-of-interest questions: link to /independence.

If you spot an error or want to suggest a correction, contact hello@treebeardai.com with the URL of the page or claim, your reasoning, and any supporting source.

Sources

Treebeard Methodology. Published seven-category framework, weights, safety floor, scoring math.
Treebeard Rating Process. Pipeline from agent discovery through publication.
Treebeard Independence. Four pillars: no token, no payment, no chain partnerships, no marketplace affiliations.
Treebeard API OpenAPI 3.1 spec. Source for every quantitative claim in this report.
How to Evaluate Whether an AI Agent Is Trustworthy. The seven-signal framework in long form.
What Is ERC-8004. Background on the standard underlying most rated agents.
Issue #1, This Week in the Agent Economy. Baseline weekly snapshot, April 17, 2026.
ZARQ. State of AI Assets, Q1 2026. April 2026. dev.to/zarq-ai (used for comparison; methodology not disclosed).
a16z. The Missing Infrastructure for AI Agents. April 2026.
Cantor, Richard, and Frank Packer. Sovereign Credit Ratings. Federal Reserve Bank of New York, 1996.
Akerlof, George A. The Market for "Lemons": Quality Uncertainty and the Market Mechanism. Quarterly Journal of Economics, 1970.
ERC-8004 specification.

Search the Agent Directory Read the Methodology

About this report

State of Agent Quality is a quarterly report from Treebeard. Q3 ships at the end of July 2026. Each quarter's report uses the same data sources and methodology, allowing direct quarter-over-quarter comparison. If a number disagrees with the live API, the API is correct. If the methodology has changed since the report was issued, the live methodology page is correct.

Treebeard is the independent rating layer for the AI agent economy. We index 176,000+ agents across 14 chains. Methodology published. No token. No payment from rated entities. Built by ENT Laboratories LLC.

Author: Patrick Burns, Founder, Treebeard. Published April 28, 2026.