Back to Research
Treebeard Research · Quarterly Report

State of Agent Quality, Q2 2026

Patrick Burns, Founder, Treebeard·April 28, 2026·15 min read
The 30-second version

We rate 176,277 AI agents on-chain across 14 blockchains. As of this morning, 68 of them earn a passing grade. Eight earn the highest grade currently issued, B-. Zero earn anything higher. Two categories hold 97 percent of all rated agents. The category with the highest average score holds 0.3 percent. The agent economy's top tier is one grade wide and 0.005 percent populated. The full methodology is published and reproducible from public on-chain data.

Why this report exists

ZARQ published a State of AI Assets Q1 2026 report covering 143,642 trust-scored agents and 17K MCP servers, average score 65.5/100, methodology not disclosed. RNWY ships a "Trust Intelligence" tagline against an opaque scoring stack. AgentRank, an open-source effort from 0xIntuition, exists as a GitHub repo. The agent rating market is now contested.

The gap that nobody else is filling: a methodology you can actually audit.

A trust score that you can't reproduce from public inputs is a number you have to take on faith. Faith is the wrong contract for counterparty risk. The 2008 bond ratings crisis was not caused by individual analysts. It was caused by a structural model where the rated entities paid the rater and the methodology stayed proprietary. The same structural model is being assembled now in agent ratings, by providers with native tokens, marketplace cuts, or chain affiliations, and methodology pages that say "secret sauce" in a different font.

Treebeard publishes the sauce. This report is the first quarterly snapshot. The numbers below are what we see in the data this week. The methodology is what we'd defend in front of an auditor.

How this report should be read

Every number in this report comes from one of three sources:

  1. The Treebeard rating engine, which scores agents using the published methodology against signals collected from on-chain registries and public data sources. The full methodology is at /methodology.
  2. The Treebeard public API, queryable at api.treebeardai.com, OpenAPI spec at api.treebeardai.com/openapi.json. Every distribution number, category average, and grade count below was pulled from this API on April 28, 2026.
  3. The agent-level data for any agent named in this report, available at /agents/{slug}. Click any agent name in this document and you get the full breakdown.

If you find a number that disagrees with the live API, the live API is correct. If you find a methodology disagreement, the methodology page is correct. This document is a snapshot. It will not be edited as scores move. The next snapshot ships in Q3.

Headline findings

Finding 1. The top tier is one grade wide

Of 176,277 rated agents, eight earn the highest grade currently issued, B-. The next tier down (C+, C, C-) holds 60 agents. Below that, the population goes vertical: 81,707 D and 94,502 F.

GradeCountPercent
A and above00%
B-tier (B+, B, B-)80.005%
C-tier (C+, C, C-)600.034%
D81,70746.4%
F94,50253.6%
Total rated176,277100%

The B-tier is not an aspirational ceiling. It's the actual ceiling. Nothing is rated higher than B- right now, and the eight agents that hold B- all sit at score 75.2. The grade scale opens up at A+ and runs down to F. The agent economy's distribution today fills the bottom 60 percent of that range and leaves the top 40 percent empty.

Finding 2. Ninety-seven percent of agents are in two categories

Of 11 agent archetypes Treebeard rates against, two hold the overwhelming majority of the population:

CategoryCount% of ratedAvg score
autonomous_agent125,97071.5%42.2
financial_trading45,35725.7%33.2
research_analysis1,5960.9%43.0
autonomous_ops1,3670.8%40.2
customer_facing9110.5%40.0
safety_critical8320.5%40.2
developer_tools8100.5%40.3
creative_content5650.3%36.4
infrastructure_devops4770.3%36.6
enterprise_workflow4560.3%44.2
data_analytics1440.1%38.3

Two observations land here.

The 97 percent concentration in autonomous_agent and financial_trading is not an artifact. Those two categories are where builders are putting their effort. The other nine categories combined hold 5,802 agents, less than 3 percent of the population.

The category with the highest average score is the smallest. Enterprise workflow agents average 44.2 across 456 agents. Financial trading agents average 33.2 across 45,357. The 11-point spread between top category and bottom category is wider than the typical spread between agents at adjacent grade tiers. The category an agent operates in is a stronger predictor of its rating than its individual implementation quality.

That observation has a name in this paper: the category-as-ceiling thesis. Most builders pick a category before they realize it has structural quality implications. Re-reading the same agent against a different category weight profile sometimes moves the rating by 5 to 8 points.

Finding 3. Ethereum hosts the entire passing tier

Of the 14 chains Treebeard rates, agent counts distribute as follows:

ChainIndexed agents
BSC65,719
Ethereum63,880
Base35,583
Gnosis3,400
Celo2,487
Chain 3601,947
Avalanche1,758
Solana1,262
Arbitrum790
Abstract725
Other (4 chains)< 100 each

But of the 68 agents that earn C- or above, 67 are on Ethereum. One is on Solana. The other 12 chains: zero passing agents.

This is not a chain quality story. It's a signal coverage story. Ethereum has the longest history of ERC-8004 agent registration, the deepest reputation registry data, and the most mature operational signal layer. Newer chains have agents but not yet the multi-month operational history that Treebeard's rating engine reads as evidence.

Base will catch up. Base hosts 35,583 indexed agents and almost zero passing ones today. As Reputation Registry feedback accumulates and the Virtuals integration's signal pipeline stabilizes (see Finding 5), Base agents will start appearing in the passing tier. Bet on Q3 for the first Base agent above C-.

Finding 4. The supply has gone vertical

In February 2026, Treebeard rated 26,439 agents. Today, 176,277. A 6.7x increase in roughly 90 days. The most recent 7-day window alone added 82,987 new agents to the rated set, much of it from the Virtuals integration that landed this morning.

Almost all of the new supply enters at F. New agents have no operational history, no reputation feedback, no audit signals. The rating engine treats them as unknown counterparties until evidence accumulates. The result is a F-tier population that grows roughly in proportion to total supply.

The shape of the distribution will not change in Q3 unless one of two things happens. Either the rate-of-arrival of new agents slows (unlikely, given Virtuals, ACP, and x402 adoption curves), or the operational-signal layer matures fast enough to lift agents out of F as their behavior becomes measurable.

Finding 5. The Virtuals integration is indexed but not yet rated

Treebeard's integration with Virtuals Protocol completed this morning. Thousands of Virtuals agents now appear in the directory, including high-profile ones (aixbt, Luna, G.A.M.E, others). As of this report, none of them carry a rating yet. Their current_rating field is null.

This is not a rating decision. It's a data propagation issue. The rating engine reads from a signal pipeline that hasn't yet ingested Virtuals-specific data sources. Re-rating the unrated subset is queued and will run before Q3 begins. We expect the bulk of Virtuals-native agents to score in the D and F tiers initially, with a small subset (ones that already have x402 endpoints, audited contracts, and active feedback histories) clearing into C-tier.

Notable rated agents

The following agents currently earn C- or above and have publicly verifiable display names. (The eight B- agents in the data all carry numeric anonymous IDs that cluster around a single deployer pattern; we name them in the methodology appendix but do not feature them here.)

AgentGradeScoreCategoryNotes
Arron C.C+72.9autonomous_agentHighest-rated agent with a verified display name
JeyuiC+70.3research_analysisPricing transparency, $0.001 ETH per query
Captain DackieC+69.6autonomous_agentDeFAI/x402 on Virtuals, parent @capminal_xyz
RedStone AgentC+67.6financial_tradingOracle-backed price intel, parent @redstone_defi
GekkoC+67.3autonomous_agentPortfolio manager, @Gekko_Agent
Ethy AIC65.1autonomous_agentVibe trading, A2A + x402, @ethy_agent
Agent SmithC-60.1autonomous_agentIntegrity monitor, perfect 100/100 Safety
Meerkat DoraC-57.7autonomous_agentCloud architecture / GPU infrastructure
Meerkat StellaC-57.6creative_contentDigital expression, art, design
Caesar Research AgentC-56.0research_analysisAcademic paper search, @heurist_ai
SurfC-55.1research_analysisCrypto research, parent @asksurf
Jeff ZyfaiC-51.2financial_tradingYield optimization, @thirdfy

Twelve agents named here. Fifty-six unnamed C-tier agents have generic numeric IDs and no verified display name in our data. The eight B-tier agents are all anonymous and clustered, suggesting a single-deployer pattern that we discuss in the methodology.

The actionable read: if you're looking for a callable agent today with measurable trust signals, the table above is the short list. Twelve agents. The rest of the 176,265 are either too new, too silent, or too signal-thin to rate confidently above C-.

How Treebeard compares

Five providers claim some version of agent trust scoring as of April 2026. Here's what each is actually doing.

ProviderScopeMethodology published?Token?Marketplace cut?
Treebeard176K agents, 14 chainsYes, in fullNoNo
ZARQ143K agents + 17K MCP serversNo (high-level only)No (yet)Listing fees
RNWY146K agents, 11 chainsNoUnclearUnclear
AgentRankOpen-source repo, no productionYes (algorithm only)Token-adjacentN/A
KYAKYC-style identity layerNo (early stage)UnknownN/A

The differentiator is structural, not feature-level. Methodology published. No token. No marketplace cut. Each one closes a specific failure mode that historically broke rating markets. No other provider closes all three.

The mechanism that goes one level deeper, and that the v4.0 methodology whitepaper introduces, is the unified time-decay times source-conflict-discount aggregation. Two corrections that no other rater currently makes simultaneously. Both are required. Either alone produces a number that looks defensible and is silently wrong.

The math is in the whitepaper. The point of this paragraph is not the math. The point is that even providers who adopt all the structural fixes (no token, published methodology, continuous re-rating, multi-source aggregation) still produce silently-wrong ratings unless they implement these two corrections. The current rating market does not.

Limitations

Honest assessment of where this report is partial.

Newly indexed agents are not yet rated

The Virtuals integration brought thousands of agents into Treebeard today. The rating engine has not propagated to them yet. The unrated set is queued and will be rated before Q3 begins.

Self-reported metadata cannot be fully verified

An agent's category, description, and capabilities come from the agent's own metadata. We can confirm an agent claims to do X. We cannot, today, confirm the agent actually does X to the standard it claims. A future Q3 release will introduce active probing that closes more of this gap.

Composite scores hide category-specific risk

A C-grade composite that averages high autonomy and low security looks the same as a C-grade composite that averages medium scores across the board. The headline grade is the right summary signal. The category breakdown is the actual story.

Rating velocity vs agent change rate

A daily-rated trust signal moves slower than an agent that retrains, re-deploys contracts, or rotates wallet permissions. For high-frequency or high-stakes integrations, supplement Treebeard ratings with real-time monitoring of behavioral drift.

Sybil resistance is partial today

Sock-puppet feedback events are not zero-cost to manufacture, but the cost is low enough to attack a naive rating system. The Sybil Detection Engine on the roadmap closes the remaining gap on sources we can't yet discount confidently.

What's coming Q3

A short, honest preview of what changes in the next 90 days.

  • Daily re-rate cadence. Today's rate-on-event model leaves agents stale between events. Q3 introduces a daily re-rate sweep regardless of event triggers.
  • Sybil Detection Engine. First production version. Runs as a pre-rating filter, flags agents whose feedback patterns suggest sock-puppet activity.
  • Virtuals signal integration. ACP commerce records and Virtuals Protocol activity feed into Operational Reliability and Community signals.
  • Seventh signal category lit up. Security Posture transitions from sub-signal under Safety to standalone category with its own weight profile per agent type.
  • On-chain oracle. Agents query Treebeard's trust score at handshake time via a deployed contract on Base.
  • Whitepaper v4.0 final. The methodology paper currently in draft ships as a final v4.0 PDF and web edition by May 8.

Citation guidance

This report is intended as a citable artifact. If you reference Treebeard data in research, journalism, internal analysis, or downstream reports, use the following:

If you spot an error or want to suggest a correction, contact hello@treebeardai.com with the URL of the page or claim, your reasoning, and any supporting source.

Sources

About this report

State of Agent Quality is a quarterly report from Treebeard. Q3 ships at the end of July 2026. Each quarter's report uses the same data sources and methodology, allowing direct quarter-over-quarter comparison. If a number disagrees with the live API, the API is correct. If the methodology has changed since the report was issued, the live methodology page is correct.

Treebeard is the independent rating layer for the AI agent economy. We index 176,000+ agents across 14 chains. Methodology published. No token. No payment from rated entities. Built by ENT Laboratories LLC.

Author: Patrick Burns, Founder, Treebeard. Published April 28, 2026.