Skip to content

Conversation

@glowsenior
Copy link

@glowsenior glowsenior commented Jan 29, 2026

Bug: Missing Stake-Weighted Aggregation in Consensus Score Calculation

Summary

The update_leaderboard function in crates/platform-server/src/db/queries.rs calculates consensus scores using a simple arithmetic mean instead of the documented stake-weighted average. This undermines the security model and doesn't match the documented behavior.

Severity

High - Security and correctness issue

Location

  • File: crates/platform-server/src/db/queries.rs
  • Line: 341 (original bug), now fixed at lines 330-413

Description

The consensus score calculation was using a simple arithmetic mean:

let consensus_score = scores.iter().sum::<f64>() / scores.len() as f64;

However, the README and documentation specify that scores should be aggregated using stake-weighted averaging with outlier detection:

$$\bar{s}_i = \frac{\sum_{v \in \mathcal{V}'} S_v \cdot s_i^v}{\sum_{v \in \mathcal{V}'} S_v}$$

Where:

  • $S_v$ is the stake of validator $v$
  • $\mathcal{V}'$ is the set of validators after outlier removal (z-score > 2.0)

Impact

  1. Security: Undermines Sybil resistance - all validators are treated equally regardless of stake
  2. Correctness: Implementation doesn't match documented behavior
  3. Fairness: High-stake validators should have more influence but currently don't
  4. Manipulation Risk: Missing outlier detection allows anomalous scores to influence results

Expected Behavior

According to README.md (lines 254-270):

Stake-Weighted Aggregation

Each validator's score is weighted by their stake:

$$\bar{s}_i = \frac{\sum_{v \in \mathcal{V}} S_v \cdot s_i^v}{\sum_{v \in \mathcal{V}} S_v}$$

Outlier Detection

Validators with anomalous scores are detected using z-score:

$$z_v = \frac{s_i^v - \mu_i}{\sigma_i}$$

Validators with $|z_v| &gt; z_{threshold}$ (default 2.0) are excluded from aggregation.

Actual Behavior

The code was calculating a simple arithmetic mean without:

  • Stake weighting
  • Outlier detection
  • Z-score filtering

Root Cause

  1. The Evaluation struct doesn't include validator stake information
  2. The function doesn't look up validator stakes from the database
  3. No outlier detection logic was implemented

Solution

Implemented calculate_stake_weighted_consensus_score() function that:

  1. ✅ Looks up validator stakes from the database for each evaluation
  2. ✅ Calculates mean and standard deviation for outlier detection
  3. ✅ Filters outliers using z-score threshold (2.0)
  4. ✅ Calculates stake-weighted average: sum(stake * score) / sum(stake)
  5. ✅ Handles edge cases (missing validators, zero stake, empty evaluations)

Code Changes

Before

let scores: Vec<f64> = evaluations.iter().map(|e| e.score).collect();
let consensus_score = scores.iter().sum::<f64>() / scores.len() as f64;

After

// Calculate stake-weighted consensus score with outlier detection
let consensus_score = calculate_stake_weighted_consensus_score(pool, &evaluations).await?;

Testing Recommendations

  1. Test with validators having different stakes to verify weighting
  2. Test outlier detection with scores that deviate significantly
  3. Test edge cases (missing validators, zero stake, single evaluation)
  4. Verify that high-stake validators have more influence on consensus scores

Related Documentation

  • README.md lines 152-180 (Validator operations)
  • README.md lines 252-279 (Score Aggregation)
  • AGENTS.md line 87 (Stake-weighted averaging)

Summary by CodeRabbit

Release Notes

  • Improvements
    • Enhanced leaderboard consensus scoring calculation with stake-weighted averaging and outlier detection for more robust and fair ranking computations.

✏️ Tip: You can customize this high-level summary in your review settings.

…ection

Replace simple arithmetic mean with stake-weighted consensus score calculation
as documented in README. This fix addresses a critical security and correctness
issue where all validators were treated equally regardless of stake.

Changes:
- Add calculate_stake_weighted_consensus_score() function
- Implement stake-weighted average: sum(stake * score) / sum(stake)
- Add outlier detection using z-score threshold (2.0)
- Look up validator stakes from database for each evaluation
- Handle edge cases (missing validators, zero stake, empty evaluations)

Security Impact:
- Restores Sybil resistance by weighting validators by stake
- Prevents manipulation through outlier detection
- High-stake validators now have appropriate influence

Fixes: Missing stake-weighted aggregation in consensus score calculation
Related: README.md lines 254-270 (Score Aggregation section)

Before: Simple mean - scores.iter().sum() / scores.len()
After: Stake-weighted with outlier filtering
@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

A new private function calculate_stake_weighted_consensus_score is added to compute consensus scores using stake weighting and statistical outlier detection via z-scores. The update_leaderboard function now calls this new calculation instead of simple averaging, while maintaining the existing leaderboard update control flow.

Changes

Cohort / File(s) Summary
Stake-Weighted Consensus Scoring
crates/platform-server/src/db/queries.rs
Adds new private function implementing stake-weighted consensus score calculation with z-score based outlier detection (threshold 2.0), validator stake integration, and fallback logic. Integrates into update_leaderboard to replace previous simple average calculation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit hops through validator stakes,
Removes the outliers for consensus's sake,
With z-scores sharp and weights that sing,
The leaderboard now has a weighted wing! 🐰✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: implementing stake-weighted aggregation with outlier detection in consensus score calculation, which is the primary focus of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant