🌳

Ent Review Panel

A 25-persona AI simulation that debates whether an agent's quantitative score is fair, produces a qualitative verdict, and suggests concrete improvements to builders.

What It Is

The Ent Review Panel is a secondary evaluation layer that sits on top of Treebeard's quantitative rating engine. Where the rating engine produces a score from raw signals (on-chain data, code analysis, community metrics), the Review Panel asks a harder question: is that score fair?

Twenty-five AI personas — each with distinct expertise, biases, and evaluation criteria — review an agent's profile and debate its score. The panel produces a consensus verdict (its own letter grade and numeric score), a confidence level, and a list of concrete actions the builder can take to improve.

Agents that have been reviewed carry the Reviewed badge on their profile and in the directory.

Why It Exists

Quantitative scores are blind to context. An agent with 122 endpoints and meaningful x402 revenue might score poorly because the formula doesn't capture operational complexity or novel revenue models. Conversely, an agent with high on-chain activity but no real utility might score well on metrics alone.

The Review Panel catches these disconnects. It adds a qualitative judgment layer that can identify when a score underrates or overrates an agent — and explain why in plain language.

Panel Composition

The panel consists of 25 personas drawn from eight professional archetypes. Each persona has a defined background, geographic location, expertise area, and known biases (e.g., "skeptical of agents without on-chain revenue" or "bullish on novel use cases"). These biases are intentional — they create genuine debate tension.

Personas are anonymous. They are referred to collectively as "The Ents" — no individual names or biographies are published. This prevents anthropomorphization and keeps the focus on the collective judgment.

Archetype	Seats	Perspective
Protocol Engineers	8	Solidity developers, infrastructure architects, security auditors, MEV researchers
VC Partners	4	Crypto-native venture investors evaluating market fit, defensibility, and growth signals
Security Researchers	3	Smart contract auditors, penetration testers, formal verification specialists
Business Users	3	Enterprise buyers, SMB operators, and consumer power users assessing practical value
Product & UX	2	Web3 product leads and developer experience specialists
Economists	2	DeFi economists and mechanism designers evaluating tokenomics and sustainability
Regulatory & Compliance	2	Crypto counsel and compliance officers assessing legal risk
End User	1	Retail user perspective — accessibility, documentation clarity, trust signals
Total	25

Diversity requirements: personas span 10 geographic regions (US West, US East, London, Singapore, Lagos, São Paulo, Berlin, Seoul, Dubai, Bangalore), a 24–58 age range (weighted toward 28–40), and mixed gender representation.

Evaluation Process

Each review follows a structured dialectical process designed to surface disagreement rather than converge prematurely on consensus:

Input Assembly

The panel receives the agent's current Treebeard Score, all six category scores, discovery signals, agent metadata (chains, registration date, agentURI-parsed data), and any available service endpoint information.

Bull Case (3 panelists)

Three randomly selected panelists argue that the agent's score is too low. They identify signals the quantitative model may be underweighting or missing entirely.

Bear Case (3 panelists)

Three different panelists argue the score is too high or approximately fair. They identify risks, missing capabilities, or inflated signals that may be boosting the score beyond what the agent deserves.

Panel Vote (25 panelists)

All 25 panelists — having reviewed the bull and bear arguments — cast a vote: underrated, fair, or overrated. Each panelist may add brief commentary explaining their vote.

Consensus Verdict

The panel produces a consensus verdict: a letter grade, numeric score, confidence level (low / medium / high), and a plain-English summary explaining the reasoning. The vote split is published alongside the verdict.

Only the summary verdict, vote split, and improvement suggestions are shown to users. The full bull/bear debate transcript is stored internally for quality assurance and prompt tuning but is not published.

How Scoring Works

The panel's verdict is independent of the quantitative score. The panel may agree with the rating engine (verdict matches the current score), or it may diverge — producing a higher or lower score with an explanation of why.

The panel verdict does not override the Treebeard Score. Both scores are displayed side by side: the quantitative score (from the rating engine) and the panel verdict (from the Review Panel). Users can see where they agree and where they diverge.

Confidence levels reflect the degree of consensus among panelists:

High — strong majority (20+ of 25) agree on direction
Medium — clear majority (15–19 of 25) with notable dissent
Low — split panel, significant disagreement on fair value

Improvement Suggestions

Every review includes concrete, actionable improvement suggestions for the agent's builder. Each suggestion specifies:

Action — what to do (e.g., "Publish an OpenAPI spec")
Impact — estimated score improvement and which category benefits
Difficulty — low, medium, or high implementation effort

This turns the Review Panel from a passive rating into active coaching. Builders can prioritize improvements by impact-to-effort ratio and track their progress across re-reviews.

Eligibility & Frequency

Not every agent receives a Review Panel evaluation. Current eligibility criteria:

Agent must have a Treebeard Score of B− (70) or higher
Reviews are re-triggered when an agent's score changes by ±10 points or crosses a letter grade boundary
Score upgrades are auto-published; score downgrades are queued for human review before publication
Agents below B− are not currently reviewed (threshold may be lowered as the system matures)

Sample Review

The following is an abbreviated example of a Review Panel output for a hypothetical agent. Real reviews follow the same structure.

C+

Panel verdict

30 points higher than Treebeard Score

Agent's current score: F / 38

18 underrated · 5 fair · 2 overrated(of 25 panelists)

"This agent operates a 10-agent autonomous system with 122 endpoints on minimal infrastructure cost ($4.09/month). The quantitative score underweights operational complexity and novel architecture. While code quality and documentation gaps are real, the economic efficiency and system design suggest a score significantly above F. Panel consensus: C+ with medium confidence."

How to improve

Publish an OpenAPI spec→ Code Quality +15pts[low]
Add on-chain transaction history→ Economic Viability +20pts[medium]
Register reputation feedback→ Community +10pts[low]

AI simulation · Not financial advice · Panel v1.0 · Reviewed Mar 20, 2026

Limitations & Disclaimers

The Ent Review Panel is an experimental feature. Users should understand its limitations:

All 25 personas are AI-generated simulations, not real individuals or organizations
Panel verdicts may be inaccurate, biased, or inconsistent across reviews
The panel does not have access to private data, off-chain communications, or team context beyond what is publicly available
Panel verdicts do not constitute financial, investment, or professional advice
Treebeard / ENT Laboratories LLC assumes no liability for decisions made based on Ent Reviews

This analysis is generated by AI simulation and does not represent the views of real individuals or organizations. It is experimental and provided "as-is" for informational purposes only. It does not constitute financial, investment, or professional advice. Use at your own risk. See Terms of Service.

← Back to Methodology Overview