Rating Scale
Every Treebeard Rating maps to a letter grade (A+ through F), a numeric score (0 to 100), a confidence percentage, and a trend indicator. Here is how to read them.
Grade Definitions
The Treebeard Rating is a composite quality assessment. Each letter grade represents a defined band of the numeric score, reflecting the agent's performance across six signal categories weighted by agent type.
Exceptional A+ (97–100) · A (93–96.9) · A- (90–92.9)
Agent demonstrates excellence across all evaluated signal categories. High economic viability, operational reliability, code quality, and safety scores with strong community standing. Suitable for integration into production systems with high trust requirements. Very few agents achieve and sustain this tier.
Above Average B+ (85–89.9) · B (80–84.9) · B- (75–79.9)
Agent performs well in most evaluated categories with some areas for improvement. Solid fundamentals with minor gaps in one or two signal categories. Appropriate for integration with standard due diligence. The majority of well-maintained production agents fall in this range.
Baseline C+ (70–74.9) · C (65–69.9)
Meets minimum thresholds and is suitable for monitored production use. A C-grade agent is doing the job acceptably — this is not near-failure. Recommend human oversight for critical operations; not a disqualifier for general workflows.
Below Average C- (55–64.9) · D (40–54.9)
Agent shows significant deficiencies across multiple signal categories. May lack adequate safety measures, demonstrate poor operational reliability, or have unresolved code quality issues. Integration carries elevated risk and should be approached with caution and comprehensive safeguards.
Failing F (0–39.9)
Agent fails to meet minimum quality thresholds. Critical deficiencies in safety, reliability, or economic viability. May indicate abandoned projects, known exploits, or fundamentally flawed architecture. Integration is not recommended under any circumstances.
Confidence Percentage
Every Treebeard Rating is accompanied by a confidence percentage (0–100%) that reflects the completeness and recency of the data used to calculate the score. Confidence is not a measure of how “sure” Treebeard is — it is a measure of how much data was available to evaluate.
Agents within their 30-day grace period (the window between initial discovery and first published rating) will always show medium or insufficient confidence as data accumulates.
Trend Indicator
The trend indicator shows the direction of the agent's numeric score over the trailing 30-day window. It does not predict future movement — it reports observed change.
Trend values are recalculated daily at 00:00 UTC alongside the main rating. A change of less than ±2.0 is displayed as stable. Trend direction can change without a corresponding change in letter grade if the score moves within a grade band.
Insufficient Data
When Treebeard cannot collect enough signal data to produce a reliable rating, the agent receives an “Insufficient Data” designation rather than a potentially misleading score.
Insufficient Data is not a negative rating — it is the absence of a rating. Treebeard will not publish a score it cannot defend. Agents in this state are still listed in the Directory and will receive a full rating as more data becomes available.
Treebeard's bias is toward caution: we would rather under-rate than over-rate, and we would rather withhold than mislead. This principle — described internally as “pessimistic scoring” — informs every edge-case decision in the rating engine.
Quick Reference
| Grade | Score Range |
|---|---|
| A+ | 97.0 – 100 |
| A | 93.0 – 96.9 |
| A- | 90.0 – 92.9 |
| B+ | 85.0 – 89.9 |
| B | 80.0 – 84.9 |
| B- | 75.0 – 79.9 |
| C+ | 70.0 – 74.9 |
| C | 65.0 – 69.9 |
| C- | 55.0 – 64.9 |
| D | 40.0 – 54.9 |
| F | 0.0 – 39.9 |