Rating Scale

Every Treebeard Rating maps to a letter grade (A+ through F), a numeric score (0 to 100), a confidence percentage, and a trend indicator. Here is how to read them.

A+
A
A-
B+
B
B-
C+
C
C-
D
F

Grade Definitions

The Treebeard Rating is a composite quality assessment. Each letter grade represents a defined band of the numeric score, reflecting the agent's performance across six signal categories weighted by agent type.

A
90 – 100

Exceptional A+ (97–100) · A (93–96.9) · A- (90–92.9)

Agent demonstrates excellence across all evaluated signal categories. High economic viability, operational reliability, code quality, and safety scores with strong community standing. Suitable for integration into production systems with high trust requirements. Very few agents achieve and sustain this tier.

A+97 – 100A93 – 96.9A-90 – 92.9
B
75 – 89.9

Above Average B+ (85–89.9) · B (80–84.9) · B- (75–79.9)

Agent performs well in most evaluated categories with some areas for improvement. Solid fundamentals with minor gaps in one or two signal categories. Appropriate for integration with standard due diligence. The majority of well-maintained production agents fall in this range.

B+85 – 89.9B80 – 84.9B-75 – 79.9
C
65 – 74.9

Baseline C+ (70–74.9) · C (65–69.9)

Meets minimum thresholds and is suitable for monitored production use. A C-grade agent is doing the job acceptably — this is not near-failure. Recommend human oversight for critical operations; not a disqualifier for general workflows.

C+70 – 74.9C65 – 69.9
D
40 – 64.9

Below Average C- (55–64.9) · D (40–54.9)

Agent shows significant deficiencies across multiple signal categories. May lack adequate safety measures, demonstrate poor operational reliability, or have unresolved code quality issues. Integration carries elevated risk and should be approached with caution and comprehensive safeguards.

C-55 – 64.9D40 – 54.9
F
0 – 39.9

Failing F (0–39.9)

Agent fails to meet minimum quality thresholds. Critical deficiencies in safety, reliability, or economic viability. May indicate abandoned projects, known exploits, or fundamentally flawed architecture. Integration is not recommended under any circumstances.

Confidence Percentage

Every Treebeard Rating is accompanied by a confidence percentage (0–100%) that reflects the completeness and recency of the data used to calculate the score. Confidence is not a measure of how “sure” Treebeard is — it is a measure of how much data was available to evaluate.

High
80–100% signal coverage. All or nearly all signals current and verified.
Medium
40–79% signal coverage. Rating produced; some data gaps noted.
Insufficient
Below 40%. No score published. See "Insufficient Data" below.

Agents within their 30-day grace period (the window between initial discovery and first published rating) will always show medium or insufficient confidence as data accumulates.

Trend Indicator

The trend indicator shows the direction of the agent's numeric score over the trailing 30-day window. It does not predict future movement — it reports observed change.

+3.4Score increased by 3.4 points over 30 days
0.0No meaningful change in score
-2.1Score decreased by 2.1 points over 30 days

Trend values are recalculated daily at 00:00 UTC alongside the main rating. A change of less than ±2.0 is displayed as stable. Trend direction can change without a corresponding change in letter grade if the score moves within a grade band.

Insufficient Data

When Treebeard cannot collect enough signal data to produce a reliable rating, the agent receives an “Insufficient Data” designation rather than a potentially misleading score.

When does this apply? An agent receives this designation when signal coverage falls below 40%. This typically occurs during the initial 30-day grace period after discovery, for agents on chains where certain data sources are not yet supported, or for agents with extremely limited public presence.

Insufficient Data is not a negative rating — it is the absence of a rating. Treebeard will not publish a score it cannot defend. Agents in this state are still listed in the Directory and will receive a full rating as more data becomes available.

Treebeard's bias is toward caution: we would rather under-rate than over-rate, and we would rather withhold than mislead. This principle — described internally as “pessimistic scoring” — informs every edge-case decision in the rating engine.

Quick Reference

GradeScore Range
A+97.0 – 100
A93.0 – 96.9
A-90.0 – 92.9
B+85.0 – 89.9
B80.0 – 84.9
B-75.0 – 79.9
C+70.0 – 74.9
C65.0 – 69.9
C-55.0 – 64.9
D40.0 – 54.9
F0.0 – 39.9

Methodology

Transparent, versioned, and open to scrutiny.

View Methodology →