Skip to content

Enable deferred writing#782

Draft
joostjager wants to merge 3 commits intolightningdevkit:mainfrom
joostjager:chain-mon-deferred-writes
Draft

Enable deferred writing#782
joostjager wants to merge 3 commits intolightningdevkit:mainfrom
joostjager:chain-mon-deferred-writes

Conversation

@joostjager
Copy link
Contributor

@joostjager joostjager commented Feb 3, 2026

Integrate deferred chain monitor writes (rust-lightning#4345) into ldk-node.

  • Patch LDK dependencies to use the chain-mon-internal-deferred-writes branch and enable deferred writes mode in ChainMonitor.
  • Switch from the sync MonitorUpdatingPersister to MonitorUpdatingPersisterAsync.
  • Re-claim inbound payments when the preimage is already known.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Feb 3, 2026

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull
Copy link
Collaborator

tnull commented Feb 3, 2026

VSS continues to use the regular ChainMonitor. While this isn't safe against force-closes, it avoids introducing potentially high-latency channel manager writes into the critical path.

Hmm, that's unfortunate. I imagine especially mobile and VSS-driven nodes would benefit the most from any change improving on the CM/CM inconsistency situation?

@TheBlueMatt
Copy link
Contributor

Yea, I think its an open question what we should do - on the one hand nodes with remote persistence are going to be the most impacted by the increase in sending latency (which is probably something where we're currently in an unacceptably-bad state, given how single-threaded some of LDK's logic is around the BP!). OTOH, they are also somewhat more likely to hit the FC-due-to-out-of-sync issues because they have high latency persistence.

I've mentioned to Joost but another option we have is to do the chanman and monitor writes at the same time but spawn them in-order, which will at least give us likely protection. We should maybe discuss live which option we want to go with.

In any case, since this is now using the async pipeline for monitor persistence anyway, we should probably switch to actual async persistence for monitors at the same time.

@joostjager
Copy link
Contributor Author

Parallel writes started in order still doesn't fully close the gap though. We'd remain in "mostly works" territory where the race window is smaller but not eliminated.

As discussed offline, for high-latency backends, an option to avoid unnecessary round trips is batched writes. Doesn't need to be atomic (which would require all KV stores to support transactions), just ordered: write chanman first, then monitors, but send them together. This would fix the FC problem without being unnecessarily slow for remote storage.

The downside is extending the KVStore interface with a batch write method, but we could provide a blanket implementation for existing KV stores that just iterates through the writes sequentially. For VSS specifically we'd implement actual batch sending to get the latency benefit.

@joostjager
Copy link
Contributor Author

Illustration of the code changes for batch writes: lightningdevkit/rust-lightning#4379

@joostjager
Copy link
Contributor Author

Updated lightningdevkit/rust-lightning#4379 to also show how the batch writes can be used in the background processor. Note that this is based on MonitorUpdatingPersister doing the queueing.

@joostjager joostjager self-assigned this Feb 5, 2026
@joostjager joostjager force-pushed the chain-mon-deferred-writes branch from aa8bf83 to a48ba95 Compare February 9, 2026 11:14
@joostjager
Copy link
Contributor Author

After offline discussion, we've landed on using DeferredChainMonitor for all backends (including VSS). Pushed the simplified version.

Next step is further parallelization of the background processor loop to reduce the impact of deferred writing. If more performance improvements are needed after that, we can again look into multi-key writes to bundle everything in a single round trip.

@joostjager
Copy link
Contributor Author

Pushed a fix for a potential payment store desync with deferred monitor writes.

With deferred writes, there's a window where the payment store records an inbound payment as Succeeded but the corresponding channel monitor update (from claim_funds) hasn't been flushed to disk yet. If the node crashes in that window and restarts, LDK replays PaymentClaimable from the stale monitor state. Previously, the event handler would see the Succeeded status and fail the HTLC backwards, causing the sender to lose funds while the receiver's store incorrectly shows a successful payment.

The fix re-claims using the stored preimage instead of failing backwards. This is safe and idempotent.

Note that without deferred writes this scenario doesn't occur, because the monitor persists the claim synchronously before the payment store is updated.

@joostjager
Copy link
Contributor Author

joostjager commented Feb 17, 2026

I benchmarked payment latency across configurations using a single BOLT11 payment with a 150ms simulated write delay on the sender's KVStore. Three independent dimensions were varied: whether the KVStore trait is async (via new_async_beta) or sync (via ChainMonitor::new), whether the background processor persists channel manager state asynchronously (async-cm-persist branch) or synchronously (chain-mon-internal-deferred-writes branch), and whether deferred monitor writes are enabled. The attached trace files can be opened with Perfetto to visualize the persistence timeline during the payment flow.

Async KVStore Async CM Persist Deferred Latency (ms) Trace File
1499.37 async-kvstore-async-cm-persist-deferred.json
879.11 async-kvstore-async-cm-persist-no-deferred.json
1431.47 async-kvstore-no-async-cm-persist-deferred.json
1117.23 async-kvstore-no-async-cm-persist-no-deferred.json
1680.72 sync-kvstore-async-cm-persist-deferred.json
1163.03 sync-kvstore-async-cm-persist-no-deferred.json
1580.34 sync-kvstore-no-async-cm-persist-deferred.json
1153.13 sync-kvstore-no-async-cm-persist-no-deferred.json

@joostjager
Copy link
Contributor Author

Opened lightningdevkit/rust-lightning#4424 to take one non-essential channel manager persist out of the critical path.

@joostjager
Copy link
Contributor Author

joostjager commented Feb 18, 2026

What still stands out in the trace below is that payment and event persists are pretty blocking in the background processor loop. So even though chan manager is persisted asynchronously, we still need to wait for those two.

image

Also when a sender completes a payment, the PaymentSent event handler in event.rs performs two sequential persistence operations: first a payment store write (updating status to Succeeded), then an event queue write (adding PaymentSuccessful). An unnecessary round trip?

joostjager and others added 2 commits February 26, 2026 10:57
Patch LDK dependencies to use the chain-mon-internal-deferred-writes
branch and enable deferred writes by passing `true` to the
ChainMonitor constructor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a PaymentClaimable event arrives for a payment already marked as
Succeeded or Spontaneous in the payment store, re-claim using the
stored preimage instead of failing the HTLC backwards. This prevents
fund loss in scenarios where the channel monitor state was not yet
persisted (e.g. with deferred monitor writes) but the payment store
already recorded the claim as successful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@joostjager joostjager force-pushed the chain-mon-deferred-writes branch from 92bd6a4 to 6528f7c Compare February 26, 2026 10:56
Use `ChainMonitor::new_async_beta` with `MonitorUpdatingPersisterAsync`
for chain monitor persistence.

Add `DynStoreRef`, a newtype wrapper that bridges the object-safe
`DynStoreTrait` (boxed futures) to LDK's generic `KVStore` trait
(`impl Future`), as required by `MonitorUpdatingPersisterAsync`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@joostjager joostjager requested a review from tnull February 26, 2026 11:52
@joostjager joostjager changed the title Use DeferredChainMonitor for non-VSS storage backends Enable deferred writing Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants