channeldb: flatten the htlc attempts bucket#5635
channeldb: flatten the htlc attempts bucket#5635Roasbeef merged 3 commits intolightningnetwork:masterfrom
Conversation
0320066 to
e2ace14
Compare
yyforyongyu
left a comment
There was a problem hiding this comment.
Interestingly I had thought about this too. I think the current approach will potentially increase the data size of each db tx. If we have many attempts inside a payment, will that be an issue?
Previously I thought we could add to state to the payment so that we wouldn't need to fetch the htlc attempts in every query. All the attempts info can be abstracted using a state. If the payment is in a pending state, we need the attempt info. Otherwise, we won't bother fetching them. WDYT?
channeldb/payments.go
Outdated
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Could the paymentsBucket be empty?
There was a problem hiding this comment.
It could but then ForEach will not iterate.
There was a problem hiding this comment.
Actually I wanted to ask was there the above bucket could be nil or not...If it could be it seems we shouldn't return an error below.
There was a problem hiding this comment.
Sorry misunderstood what you meant. The semantics of NestedRead(Write)Bucket are a bit tricky since it never returns an error. When it returns nil it means that the bucket doesn't exists. In this case the payments-root-bucket can either be empty (no payments yet) or only contains buckets which we ensure with this check.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Just to be safe, I think we need to do it in three steps,
- get the old keys
- put the new keys
- delete the old keys iff the second step succeeded
There was a problem hiding this comment.
This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.
Not sure what we can realistically do about it here...
I only have two ideas:
- Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
- Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...
There was a problem hiding this comment.
Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt is unable to handle such large txn. Also due to multiple bugs in bbolt cursors we can't safely Delete [1] or Put [2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.
[1] boltdb/bolt#357
[2] boltdb/bolt#268
There was a problem hiding this comment.
This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.
Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Why do we convert it into string?
There was a problem hiding this comment.
Migration code is now very differrent, so we avoid this conversion.
guggero
left a comment
There was a problem hiding this comment.
Great PR, very easy to follow the diff!
Added a comment about the DB transaction size, but not sure what we can realistically do about it. Knowing how many round trips this will save us, I think we do want this change to be implemented.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Do we need to care about nil vs. empty slice here, depending on the backend used? Maybe use len() > 0 instead? They should never be empty if they exist, right?
There was a problem hiding this comment.
They should never be empty, but I agree len() > 0 is safer. Done.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.
Not sure what we can realistically do about it here...
I only have two ideas:
- Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
- Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...
There was a problem hiding this comment.
Depending on the decision we make concerning the potentially large DB transaction, we might need to add some more information here what the user can do if they run into problems.
538ff09 to
6fd0cab
Compare
bhandras
left a comment
There was a problem hiding this comment.
Thanks for the reviews @guggero and @yyforyongyu. Changed migration logic to be more memory friendly and added more tests to cover all cases. PTAL.
channeldb/payments.go
Outdated
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
It could but then ForEach will not iterate.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Migration code is now very differrent, so we avoid this conversion.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
They should never be empty, but I agree len() > 0 is safer. Done.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt is unable to handle such large txn. Also due to multiple bugs in bbolt cursors we can't safely Delete [1] or Put [2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.
[1] boltdb/bolt#357
[2] boltdb/bolt#268
guggero
left a comment
There was a problem hiding this comment.
LGTM 🎉
Tested the migration with 790k payments and it took about 40 seconds on my machine which isn't too bad at all!
channeldb/payments.go
Outdated
There was a problem hiding this comment.
I'm wondering if there should be a default case that returns an error? Since there shouldn't be any other keys in this bucket anymore.
There was a problem hiding this comment.
Yes indeed, that's a good point. Done
channeldb/payments.go
Outdated
There was a problem hiding this comment.
We're slightly changing the logic here. Now we won't fail anymore if there isn't an attempt info key for a payment. But I guess that's probably okay?
There was a problem hiding this comment.
Yes, you're right. Added back a santiy check to ensure that all attempts have an attempt info.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
nit: could use a read bucket.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
How many attempts are there for those 100 million payments?
There was a problem hiding this comment.
As far as I understand there's no hard upper bound, but certainly limited by the graph itself so we can say it's constant per payment? Also since we migrate per payment we won't pre-allocate for each attempt.
There was a problem hiding this comment.
Gocha. I asked because I was curious about the 3 GiB memory usage.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Actually I wanted to ask was there the above bucket could be nil or not...If it could be it seems we shouldn't return an error below.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.
Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?
What happens when the program exits during those 40 seconds? Will the migration continue again after restart? |
Since it's one large DB transaction (which is what makes this tricky memory wise for both bolt and remote DBs), it's only committed in the end. So if the process exits, the DB is rolled back and the full migration retries on next startup. |
6fd0cab to
f023631
Compare
yyforyongyu
left a comment
There was a problem hiding this comment.
LGTM👍
More thoughts on this topic. I think the htlc attempts are only needed when the payment is not in a terminal state. Once the payment is done(fail/succeed), the extra bandwidth used to carry htlc attempts is unnecessary. And since the payment is usually completed in a short time(60 seconds default timeout iirc), most of the time, those htlc attempts could be avoided. It would be nice if we could somehow specify what to be queried in that one big db transaction, kinda like constructing SQL queries beforehand like the old-fashioned way. Of course this is some future work/discussion.
channeldb/migration23/migration.go
Outdated
There was a problem hiding this comment.
Gocha. I asked because I was curious about the 3 GiB memory usage.
f023631 to
f2cc783
Compare
|
Visit https://dashboard.github.orijtech.com?back=0&pr=5635&remote=true&repo=bhandras%2Flnd to see benchmark details. |
Roasbeef
left a comment
There was a problem hiding this comment.
LGTM
Just some small nits that can be cleaned up, but can be tacked onto another of the dependent PRs, want to keep this PR train moving along 🚄
| // failure information, if any. | ||
| htlcFailInfoKey = []byte("htlc-fail-info") | ||
| // htlcFailInfoKey is the key used as the prefix of an HTLC attempt | ||
| // failure information, if any.The HTLC attempt ID is concatenated at |
There was a problem hiding this comment.
Missing space after the prior here, also some extra spaces after The.
| htlcBucket := bucket.NestedReadBucket(k) | ||
| attemptInfoCount := 0 | ||
| err := bucket.ForEach(func(k, v []byte) error { | ||
| aid := byteOrder.Uint64(k[len(k)-8:]) |
There was a problem hiding this comment.
8 seems to the the attempt ID itself, could be lifted to a constant somewhere above.
|
Congrats @bhandras and well done! Indeed, to chime in your report of a 25% increase, this change showed all good improvements in the benchmarks per https://dashboard.github.orijtech.com/benchmark/8dcabf559b0b435c9a05c203224c9fdd /cc @cuonglm @kirbyquerby @willpoint Thanks @Roasbeef for adopting and using bencher :-) We plan on introducing better UX to improve the workflow and results |


This PR is a split from #5392 where in order to be able to fetch htlc attempts of a payment in one backend transaction, we flatten the
payment-htlcs-bucketsuch that it won't include a sub bucket for each individual attempt, but instead we will postfix attempt info keys with the attempt ID. Furthermore we shorten the key prefixes for convenience.Why this flattening is important?
When using a remote DB backend (etcd or Postgres) fetching a payment can be especially heavy since each DB operation will result in a round-trip to the DB. By flattening the htlc's bucket where each attempt uses a global ID we may be able to fetch the bucket in one go. An implementation of such strategy can be found here: #5640
The speedup is roughly 25% when using the
async-payments-benchmarkintegration test and helps with achiving stable performance using the bottlepay benchmark suite.