Ent Review Panel
A 25-persona AI simulation that debates whether an agent's quantitative score is fair, produces a qualitative verdict, and suggests concrete improvements to builders.
What It Is
The Ent Review Panel is a secondary evaluation layer that sits on top of Treebeard's quantitative rating engine. Where the rating engine produces a score from raw signals (on-chain data, code analysis, community metrics), the Review Panel asks a harder question: is that score fair?
Twenty-five AI personas — each with distinct expertise, biases, and evaluation criteria — review an agent's profile and debate its score. The panel produces a consensus verdict (its own letter grade and numeric score), a confidence level, and a list of concrete actions the builder can take to improve.
Agents that have been reviewed carry the Reviewed badge on their profile and in the directory.
Why It Exists
Quantitative scores are blind to context. An agent with 122 endpoints and meaningful x402 revenue might score poorly because the formula doesn't capture operational complexity or novel revenue models. Conversely, an agent with high on-chain activity but no real utility might score well on metrics alone.
The Review Panel catches these disconnects. It adds a qualitative judgment layer that can identify when a score underrates or overrates an agent — and explain why in plain language.
Panel Composition
The panel consists of 25 personas drawn from eight professional archetypes. Each persona has a defined background, geographic location, expertise area, and known biases (e.g., "skeptical of agents without on-chain revenue" or "bullish on novel use cases"). These biases are intentional — they create genuine debate tension.
Personas are anonymous. They are referred to collectively as "The Ents" — no individual names or biographies are published. This prevents anthropomorphization and keeps the focus on the collective judgment.
| Archetype | Seats | Perspective |
|---|---|---|
| Protocol Engineers | 8 | Solidity developers, infrastructure architects, security auditors, MEV researchers |
| VC Partners | 4 | Crypto-native venture investors evaluating market fit, defensibility, and growth signals |
| Security Researchers | 3 | Smart contract auditors, penetration testers, formal verification specialists |
| Business Users | 3 | Enterprise buyers, SMB operators, and consumer power users assessing practical value |
| Product & UX | 2 | Web3 product leads and developer experience specialists |
| Economists | 2 | DeFi economists and mechanism designers evaluating tokenomics and sustainability |
| Regulatory & Compliance | 2 | Crypto counsel and compliance officers assessing legal risk |
| End User | 1 | Retail user perspective — accessibility, documentation clarity, trust signals |
| Total | 25 |
Diversity requirements: personas span 10 geographic regions (US West, US East, London, Singapore, Lagos, São Paulo, Berlin, Seoul, Dubai, Bangalore), a 24–58 age range (weighted toward 28–40), and mixed gender representation.
Evaluation Process
Each review follows a structured dialectical process designed to surface disagreement rather than converge prematurely on consensus:
Input Assembly
The panel receives the agent's current Treebeard Score, all six category scores, discovery signals, agent metadata (chains, registration date, agentURI-parsed data), and any available service endpoint information.
Bull Case (3 panelists)
Three randomly selected panelists argue that the agent's score is too low. They identify signals the quantitative model may be underweighting or missing entirely.
Bear Case (3 panelists)
Three different panelists argue the score is too high or approximately fair. They identify risks, missing capabilities, or inflated signals that may be boosting the score beyond what the agent deserves.
Panel Vote (25 panelists)
All 25 panelists — having reviewed the bull and bear arguments — cast a vote: underrated, fair, or overrated. Each panelist may add brief commentary explaining their vote.
Consensus Verdict
The panel produces a consensus verdict: a letter grade, numeric score, confidence level (low / medium / high), and a plain-English summary explaining the reasoning. The vote split is published alongside the verdict.
Only the summary verdict, vote split, and improvement suggestions are shown to users. The full bull/bear debate transcript is stored internally for quality assurance and prompt tuning but is not published.
How Scoring Works
The panel's verdict is independent of the quantitative score. The panel may agree with the rating engine (verdict matches the current score), or it may diverge — producing a higher or lower score with an explanation of why.
The panel verdict does not override the Treebeard Score. Both scores are displayed side by side: the quantitative score (from the rating engine) and the panel verdict (from the Review Panel). Users can see where they agree and where they diverge.
Confidence levels reflect the degree of consensus among panelists:
- High — strong majority (20+ of 25) agree on direction
- Medium — clear majority (15–19 of 25) with notable dissent
- Low — split panel, significant disagreement on fair value
Improvement Suggestions
Every review includes concrete, actionable improvement suggestions for the agent's builder. Each suggestion specifies:
- Action — what to do (e.g., "Publish an OpenAPI spec")
- Impact — estimated score improvement and which category benefits
- Difficulty — low, medium, or high implementation effort
This turns the Review Panel from a passive rating into active coaching. Builders can prioritize improvements by impact-to-effort ratio and track their progress across re-reviews.
Eligibility & Frequency
Not every agent receives a Review Panel evaluation. Current eligibility criteria:
- Agent must have a Treebeard Score of B− (70) or higher
- Reviews are re-triggered when an agent's score changes by ±10 points or crosses a letter grade boundary
- Score upgrades are auto-published; score downgrades are queued for human review before publication
- Agents below B− are not currently reviewed (threshold may be lowered as the system matures)
Sample Review
The following is an abbreviated example of a Review Panel output for a hypothetical agent. Real reviews follow the same structure.
68
Panel verdict
30 points higher than Treebeard Score
Agent's current score: F / 38
18 underrated · 5 fair · 2 overrated(of 25 panelists)
"This agent operates a 10-agent autonomous system with 122 endpoints on minimal infrastructure cost ($4.09/month). The quantitative score underweights operational complexity and novel architecture. While code quality and documentation gaps are real, the economic efficiency and system design suggest a score significantly above F. Panel consensus: C+ with medium confidence."
How to improve
- Publish an OpenAPI spec→ Code Quality +15pts[low]
- Add on-chain transaction history→ Economic Viability +20pts[medium]
- Register reputation feedback→ Community +10pts[low]
AI simulation · Not financial advice · Panel v1.0 · Reviewed Mar 20, 2026
Limitations & Disclaimers
The Ent Review Panel is an experimental feature. Users should understand its limitations:
- All 25 personas are AI-generated simulations, not real individuals or organizations
- Panel verdicts may be inaccurate, biased, or inconsistent across reviews
- The panel does not have access to private data, off-chain communications, or team context beyond what is publicly available
- Panel verdicts do not constitute financial, investment, or professional advice
- Treebeard / ENT Laboratories LLC assumes no liability for decisions made based on Ent Reviews
This analysis is generated by AI simulation and does not represent the views of real individuals or organizations. It is experimental and provided "as-is" for informational purposes only. It does not constitute financial, investment, or professional advice. Use at your own risk. See Terms of Service.