The Trust Deficit in the Agent Economy: A Case for Independent Ratings Infrastructure

Patrick Burns·February 2026·12 min read

Abstract

The AI agent ecosystem has grown by an order of magnitude in under two years, yet no independent, methodology-driven evaluation infrastructure exists for assessing agent quality. This paper examines the structural information asymmetry between agent producers and agent consumers, identifies the recurring historical pattern by which transparency infrastructure emerges in maturing markets, and outlines the design requirements for a credible agent rating system. We propose Treebeard as a first contribution toward closing this gap.

1.Origin and Motivation

My background is in stablecoin lending infrastructure. In 2025, I began studying the emerging opportunity space around extending credit to — and facilitating lending by — AI agents. The thesis was straightforward: as agents increasingly participate in economic transactions, they will require access to credit facilities, working capital, and liquidity management tools analogous to those available to human economic actors.

The fundamental challenge surfaced immediately. Credit extension requires risk assessment. Risk assessment requires standardized, comparable quality signals. For AI agents, no such signal infrastructure exists. There is no credit bureau, no rating agency, no normalized data layer that allows a counterparty to evaluate whether a given agent is operationally reliable, economically sustainable, architecturally sound, or behaviorally safe.

This is not merely a lending problem. The same information deficit affects every decision point in the agent economy: marketplace operators selecting which agents to list, developers choosing integration partners, investors conducting due diligence on agent tokens, and enterprises evaluating autonomous workflow candidates. The absence of independent evaluation infrastructure is a systemic bottleneck constraining the maturation of the entire market.

2.The Information Asymmetry Problem

The core issue is a well-documented market failure: asymmetric information between producers and consumers of a complex good.

Agent builders possess granular knowledge of their systems — source code quality, security posture, economic model viability, team depth, known failure modes. Agent evaluators — the participants assuming financial and operational risk — have access to none of this in standardized form. The available inputs for evaluation decisions are typically limited to marketing collateral, self-reported metrics, community sentiment, and token price action. None of these constitute independently verified quality signals.

Akerlof's “lemons problem” applies directly: when buyers cannot distinguish high-quality agents from low-quality agents, the market underprices quality and overweights noise. Builders of genuinely capable agents are penalized by the inability to credibly signal their quality. Builders of low-quality or fraudulent agents exploit the opacity. The result is adverse selection that suppresses adoption, depresses valuations for legitimate projects, and erodes trust across the ecosystem.

The resolution, as in every analogous market, requires introducing a trusted intermediary that reduces the information gap through standardized, independent evaluation.

3.Historical Precedent: Transparency Infrastructure in Maturing Markets

The emergence of independent evaluation infrastructure is a recurring structural feature of market maturation. Three precedents are particularly instructive.

Credit markets

Prior to 1909, bond evaluation was conducted through relationship-based assessment and qualitative judgment. Moody’s introduction of standardized letter-grade ratings created comparability across issuers, geographies, and sectors for the first time. The resulting transparency infrastructure became foundational to modern capital markets — not by recommending investments, but by normalizing the informational substrate upon which investment decisions are made.

Digital asset markets

The 2017–2018 token proliferation produced thousands of assets with no standardized comparative framework. CoinGecko and CoinMarketCap addressed this by aggregating and normalizing metadata — market capitalization, trading volume, circulating supply, exchange coverage — into a common data layer. These platforms made no investment recommendations; they made information accessible.

Decentralized finance

As DeFi protocols assumed custody of significant user capital, specialized risk evaluation frameworks emerged: process quality audits, pessimistic scoring methodologies for safety-critical parameters, and on-chain creditworthiness models. Each addressed a specific dimension of the trust problem in a market operating beyond the reach of traditional regulatory oversight.

In each case, the pattern is identical: rapid market growth produces an information asymmetry that suppresses efficient capital allocation. An independent intermediary emerges to provide standardized evaluation. That evaluation layer becomes foundational infrastructure. AI agents have reached the equivalent inflection point.

4.Why Agent Evaluation Is a Non-Trivial Problem

The absence of agent ratings infrastructure is not attributable to lack of demand. Rather, the problem is technically difficult in ways that distinguish it from prior evaluation challenges.

Agents are dynamic systems, not static instruments

A bond has a balance sheet and a coupon schedule. A token has on-chain metadata and trading history. An AI agent is evolving software that makes autonomous decisions, operates across heterogeneous environments, and exhibits emergent behavior under novel conditions. Meaningful evaluation requires synthesizing signals from fundamentally different domains: source code repositories, on-chain transaction records, infrastructure monitoring systems, economic performance data, and behavioral assessment under adversarial conditions.

Individual signals are gameable

Synthetic GitHub activity is inexpensive to produce. Wash trading inflates volume metrics at known cost. Community engagement metrics are trivially manufactured. Any unidimensional rating methodology would face exploitation within weeks of publication. Credible evaluation requires multi-dimensional assessment with deliberate weighting toward signals whose cost-to-fabricate is prohibitively high — verified on-chain transaction histories, independent code audits, observed production integration evidence, longitudinal user retention data.

Cross-referencing is essential

A robust methodology must correlate signals across independent data sources such that inconsistencies surface algorithmically. If an agent’s reported GitHub activity diverges from its observable on-chain footprint, or its claimed user base is inconsistent with its transaction volume, the model should flag the discrepancy without manual intervention.

5.Treebeard: Design Requirements and Architecture

Treebeard is an independent rating and intelligence platform for the AI agent economy. The system architecture comprises three layers:

Directory. A comprehensive, searchable index of AI agents across major ecosystems — blockchain-native, enterprise, developer tools, and consumer — populated through proactive discovery via blockchain monitoring, marketplace API ingestion, and source code repository analysis. Discovery is not contingent on self-submission.

Rating engine. A proprietary quality assessment producing a composite score (0–100) with per-category breakdowns, a confidence percentage, and a trend indicator. The methodology draws on institutional risk frameworks and is designed around six signal categories spanning economic viability, operational reliability, code quality, autonomy assessment, safety, and ecosystem integration.

Intelligence layer. Analytics, leaderboards, trend data, and programmatic API access enabling downstream consumption by marketplace operators, orchestration platforms, institutional research teams, and developer toolchains.

The business model is predicated on independence. Treebeard does not offer sponsored listings, sell favorable ratings, or operate a token. Revenue derives from intelligence products — white-label data licensing, API access tiers, and institutional research — ensuring that the incentive structure aligns with evaluation accuracy rather than issuer relationships.

6.Scope and Trajectory

Initial deployment targets the crypto-native agent ecosystem, where the combination of real capital at risk, pseudonymous participants, and minimal regulatory oversight produces the most acute information asymmetry. However, the methodology architecture is intentionally modular and ecosystem-agnostic: the core rating engine accepts scored signals and produces composites, while signal-gathering modules are swappable per agent type and environment.

This design enables extension into enterprise AI agents, developer tool agents, consumer-facing agents, and hybrid systems without rewriting the evaluation framework. As agent deployment accelerates across industries, the same fundamental demand for independent quality assessment will emerge in each vertical.

Every significant market develops its evaluation infrastructure. The agent economy is no exception. The relevant question is whether that infrastructure will be constructed by platforms with inherent conflicts of interest, by regulatory bodies operating at insufficient velocity, or by independent intermediaries with no financial stake in any particular evaluation outcome.

Treebeard is designed to be the third option. This paper represents our first public contribution toward that objective.

Patrick Burns

Founder & CEO, Treebeard

How Treebeard Works →·Rating Methodology →·Trust Policy →