Skip to content

Comments

Add per-peer profit scoring#3247

Open
t-bast wants to merge 3 commits intomasterfrom
peer-scorer
Open

Add per-peer profit scoring#3247
t-bast wants to merge 3 commits intomasterfrom
peer-scorer

Conversation

@t-bast
Copy link
Member

@t-bast t-bast commented Feb 5, 2026

We create a new set of actors that keep track of payment statistics across our peers and rank them to identify the top profit earners. Based on those statistics, the actors issue recommendations to:

  • allocate more liquidity towards nodes that are generating revenue and may run out of liquidity in the next few days
  • reclaim liquidity from inactive channels
  • change our relay fees to optimize increases or decreases in outgoing flow and volume

This is disabled by default, and the actors aren't created unless eclair.conf is modified to explicitly enable them.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 55.40541% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.00%. Comparing base (f93d02f) to head (3473f10).
⚠️ Report is 78 commits behind head on master.

Files with missing lines Patch % Lines
...main/scala/fr/acinq/eclair/profit/PeerScorer.scala 44.69% 99 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3247      +/-   ##
==========================================
+ Coverage   86.43%   88.00%   +1.56%     
==========================================
  Files         242      219      -23     
  Lines       22607    20759    -1848     
  Branches      832      821      -11     
==========================================
- Hits        19541    18268    -1273     
+ Misses       3066     2491     -575     
Files with missing lines Coverage Δ
...ir-core/src/main/scala/fr/acinq/eclair/Setup.scala 71.97% <100.00%> (-1.40%) ⬇️
.../scala/fr/acinq/eclair/channel/ChannelEvents.scala 100.00% <100.00%> (ø)
...c/main/scala/fr/acinq/eclair/channel/Helpers.scala 92.85% <100.00%> (+0.21%) ⬆️
...acinq/eclair/channel/fsm/DualFundingHandlers.scala 88.37% <100.00%> (ø)
...la/fr/acinq/eclair/channel/fsm/ErrorHandlers.scala 82.84% <100.00%> (-0.58%) ⬇️
...inq/eclair/channel/fsm/SingleFundingHandlers.scala 78.37% <100.00%> (-1.63%) ⬇️
...cinq/eclair/channel/publish/MempoolTxMonitor.scala 87.20% <100.00%> (ø)
...main/scala/fr/acinq/eclair/db/DbEventHandler.scala 86.95% <100.00%> (-2.33%) ⬇️
...c/main/scala/fr/acinq/eclair/db/pg/PgAuditDb.scala 80.42% <100.00%> (-17.84%) ⬇️
...cala/fr/acinq/eclair/db/sqlite/SqliteAuditDb.scala 79.16% <100.00%> (-18.99%) ⬇️
... and 5 more

... and 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Instead of a detailed review of calculations, I intend to compare the resulting values with my own db requests from the outside. Easier and better I think.

@t-bast t-bast marked this pull request as ready for review February 11, 2026 18:22
@t-bast t-bast force-pushed the peer-scorer branch 3 times, most recently from bbed0e5 to 1f06cf7 Compare February 19, 2026 13:43
@t-bast
Copy link
Member Author

t-bast commented Feb 20, 2026

Claude has the following very good suggestions to go further (but this should wait for next steps):

---                                                                                                                                                                                                         
  Heuristics Analysis                                                                                                                                                                                         
                                                                                                                                                                                                              
  What I think works well                                                                                                                                                                                     
                                                                                                                                                                                                              
  Bucketing (3h buckets, 7-day window) is a good tradeoff. It smooths out short-lived noise (a single large payment shouldn't trigger a funding decision) while still capturing weekly traffic patterns.
  Aligning buckets to UTC clock boundaries also makes the rolling stats predictable and debuggable.

  Probabilistic rate-limiting is smart. Using random coin flips (~3 peers per round, 1-per-3-days for small peers) avoids the thundering-herd problem where all nodes would simultaneously react to the same
  network conditions. This is better than a simple "fund once every N hours" counter.

  Five distinct selection strategies correctly recognize that different peers need different interventions. A peer that ran out of liquidity needs to be treated differently from a peer that hasn't been
  funded yet. Separating "revive" from "needs liquidity" from "good small peer" shows good domain understanding.

  Feerate and balance guards are the right safety valves — liquidity management during a fee spike would be counterproductive.

  ---
  Concerns and questions

  1. Outgoing flow as the primary funding signal

  Using outboundFlow = out - in to identify peers that need more liquidity makes intuitive sense — if a channel is draining outward, that direction is in demand. However, it only looks at one side of the
  balance sheet. A peer with high incoming flow is also valuable (it replenishes your outgoing capacity on other channels). I'd expect the optimal metric to be something closer to gross volume in the
  direction you're capacity-constrained, not just net flow.

  2. Volume → fee adjustment direction

  The fee update heuristics look at volume trends. I'd want to double-check the direction of the adjustment here:
  - If volume is increasing, does the code lower fees (to capture more flow) or raise them (to extract more margin)?
  - In routing, raising fees when demand is high is often more profitable, but lowering fees when demand is low makes sense to attract flow. Which model does this implement?

  3. Cold start: 2 buckets = 6 hours

  The hasPastData check requires at least 2 buckets of history before acting. That's 6 hours of real-time data after each restart. Given the two TODOs about reading from AuditDb, this means the system is
  blind for 6 hours after every restart — which could be a significant gap in production if restarts happen regularly.

  4. 10%/20% volume change thresholds

  The magic numbers (0.1, 0.2, 0.8) for volume change detection and fee adjustment are used inline. Beyond the readability issue, I'd want to know: were these chosen empirically, or are they initial
  guesses? If the latter, they might need tuning once the feature is running in production. Making them configurable (or at least named constants) would make that easier.

  5. No feedback loop

  The scoring runs every hour, but there's no signal about whether previous decisions had the intended effect. For example: after funding a peer that was "needs liquidity", does the system check whether
  routing volume actually increased? Without a feedback loop, the heuristics could repeatedly fund the same peer if the root cause is something else (e.g., the peer has poor connectivity, not low
  liquidity).

  6. Bucket granularity vs. burst sensitivity

  3-hour buckets mean a sudden traffic burst that fills a channel in 20 minutes won't show up as a problem until the next scoring round. For high-traffic nodes this might be fine, but it's worth knowing the
   system isn't designed for real-time reactivity.

  ---
  Summary

  The overall approach is sound — event-sourced stats, conservative defaults, probabilistic rate-limiting, multiple selection strategies. The main things I'd want to think more about are: (1) whether
  outgoing flow alone is the right funding signal, (2) confirming the fee adjustment direction is correct, and (3) planning the AuditDb integration to eliminate the 6-hour blind window on restart.

@t-bast
Copy link
Member Author

t-bast commented Feb 20, 2026

When asked how it would have done things differently, it provides some useful directions as well, which I think are worth considering (some of them were already on my todo-list for improvements):

A few things I'd approach differently:                                                                                                                                                       
                                                                                                                                                                                                              
  ---
  1. Prioritize DB initialization, not defer it                                                                                                                                                               
                                                        
  The two TODOs about reading from AuditDb on startup feel like they should have been the starting point, not a follow-up. The 6-hour blind window after every restart is a real operational gap, and         
  designing around it (the hasPastData guard) adds complexity that wouldn't be needed if history was loaded at boot. I'd have built the DB query first and skipped the workaround entirely.

  2. Closed-loop control

  The current system is open-loop — it funds a peer and moves on with no record of whether it worked. I'd track funding decisions with timestamps (per peer) and require that a previous decision had time to
  be evaluated before making another one for the same peer. Something like: "don't fund the same peer twice within 72h, and check if volume actually increased before the second funding." Without this, you
  can end up repeatedly funding a peer whose problem isn't liquidity.

  3. Different primary metric for funding

  Outgoing flow (out - in) is a reasonable proxy, but the better signal is capacity exhaustion events — how often a channel hits near-zero balance and how much volume was lost as a result. That directly
  answers "would more liquidity here generate more fees?" rather than inferring it from net flow. This would require tracking balance-at-capacity-hit events in PeerStatsTracker, which is more work, but more
   directly correlated with the decision you're making.

  4. EMA instead of fixed buckets

  3-hour buckets are simple but have aliasing effects (a payment at 2:59 vs 3:01 land in different buckets) and require storing 56 values per peer. An exponential moving average over multiple timescales
  (short, medium, long) would be more memory-efficient, eliminate aliasing, and give you trend detection "for free" by comparing the short and long EMAs — without needing the (bucket[n] / bucket[n-2]) * 1.1
   > threshold comparisons.

  5. Separate scoring from policy

  PeerScorer currently handles both "rank peers" and "decide what to do about them." I'd split these: a pure PeerRanker that outputs a sorted, annotated peer list, and separate configurable policy
  components (FundingPolicy, FeePolicy, ClosePolicy). This makes it easier to test the ranking logic independently, swap out policies, and eventually expose the rankings via API for manual inspection or
  override — without coupling that to the action-taking code.

  6. Be careful about fee oscillation

  Threshold-based fee adjustment can oscillate: lower fees → more volume → trigger "raise fees" → less volume → trigger "lower fees" again. I'd require a sustained trend across multiple consecutive scoring
  periods before adjusting, not just a single-bucket comparison.

  ---
  The bucketing approach and probabilistic rate-limiting are things I'd keep — those are genuinely good ideas. The core gap is the lack of a feedback loop and the deferred DB initialization.

We create a new set of actors that keep track of payment statistics
across our peers and rank them to identify the top profit earners.
Based on those statistics, the actors issue recommendations to:

- allocate more liquidity towards nodes that are generating revenue and
  may run out of liquidity in the next few days
- reclaim liquidity from inactive channels
- change our relay fees to optimize increases or decreases in outgoing
  flow and volume
Otherwise lines are wrapped, which hurts readability in the terminal.
We improve the funding algorithm by separately selecting peers that
are performing well in absolute from peers that are performing well
in relative terms, and peers that *probably* performed well in the
past but ran out of liquidity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants