Skip to content

Comments

channeldb: flatten the htlc attempts bucket#5635

Merged
Roasbeef merged 3 commits intolightningnetwork:masterfrom
bhandras:flatten-htlc-bucket
Sep 8, 2021
Merged

channeldb: flatten the htlc attempts bucket#5635
Roasbeef merged 3 commits intolightningnetwork:masterfrom
bhandras:flatten-htlc-bucket

Conversation

@bhandras
Copy link
Collaborator

@bhandras bhandras commented Aug 17, 2021

This PR is a split from #5392 where in order to be able to fetch htlc attempts of a payment in one backend transaction, we flatten the payment-htlcs-bucket such that it won't include a sub bucket for each individual attempt, but instead we will postfix attempt info keys with the attempt ID. Furthermore we shorten the key prefixes for convenience.

Why this flattening is important?

When using a remote DB backend (etcd or Postgres) fetching a payment can be especially heavy since each DB operation will result in a round-trip to the DB. By flattening the htlc's bucket where each attempt uses a global ID we may be able to fetch the bucket in one go. An implementation of such strategy can be found here: #5640

The speedup is roughly 25% when using the async-payments-benchmark integration test and helps with achiving stable performance using the bottlepay benchmark suite.

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly I had thought about this too. I think the current approach will potentially increase the data size of each db tx. If we have many attempts inside a payment, will that be an issue?
Previously I thought we could add to state to the payment so that we wouldn't need to fetch the htlc attempts in every query. All the attempts info can be abstracted using a state. If the payment is in a pending state, we need the attempt info. Otherwise, we won't bother fetching them. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra new line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the paymentsBucket be empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but then ForEach will not iterate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wanted to ask was there the above bucket could be nil or not...If it could be it seems we shouldn't return an error below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry misunderstood what you meant. The semantics of NestedRead(Write)Bucket are a bit tricky since it never returns an error. When it returns nil it means that the bucket doesn't exists. In this case the payments-root-bucket can either be empty (no payments yet) or only contains buckets which we ensure with this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be safe, I think we need to do it in three steps,

  • get the old keys
  • put the new keys
  • delete the old keys iff the second step succeeded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.

Not sure what we can realistically do about it here...
I only have two ideas:

  1. Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
  2. Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt is unable to handle such large txn. Also due to multiple bugs in bbolt cursors we can't safely Delete [1] or Put [2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.

[1] boltdb/bolt#357
[2] boltdb/bolt#268

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.

Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert it into string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration code is now very differrent, so we avoid this conversion.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, very easy to follow the diff!

Added a comment about the DB transaction size, but not sure what we can realistically do about it. Knowing how many round trips this will save us, I think we do want this change to be implemented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about nil vs. empty slice here, depending on the backend used? Maybe use len() > 0 instead? They should never be empty if they exist, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should never be empty, but I agree len() > 0 is safer. Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction. Therefore the order here doesn't play a huge role, if anything goes wrong, the whole transaction is rolled back.
But that is actually the main concern. Because we need every migration to be capable of running atomically, we'll end up migrating all payments in one transaction. For some nodes that might be a huge chunk of data.

Not sure what we can realistically do about it here...
I only have two ideas:

  1. Just go with it and hope it will be fine for all users. If anyone runs into a problem, we can advise them to delete failed payments first (we might need [WIP] Batched payment deletion #5368 and a CLI tool for using that RPC as well) for that.
  2. Do the migration in four steps: one for each key (info/settle/fail), where we just copy the data and then a last step that deletes the attempt ID buckets. That way the individual transactions would still be atomic (meaning either all payments are copied or none are) but lnd will only start once all 4 steps were run. But if let's say step 3 fails, you're in a bad state as well, so this sounds like a bad idea...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the decision we make concerning the potentially large DB transaction, we might need to add some more information here what the user can do if they run into problems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal

@bhandras bhandras force-pushed the flatten-htlc-bucket branch 2 times, most recently from 538ff09 to 6fd0cab Compare August 24, 2021 17:09
Copy link
Collaborator Author

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews @guggero and @yyforyongyu. Changed migration logic to be more memory friendly and added more tests to cover all cases. PTAL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but then ForEach will not iterate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration code is now very differrent, so we avoid this conversion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should never be empty, but I agree len() > 0 is safer. Done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay PTAL, I think we should be safe from over allocating ourselves. The only issue may be that bbolt is unable to handle such large txn. Also due to multiple bugs in bbolt cursors we can't safely Delete [1] or Put [2] inside the cursor as it turns out, so I rewrote the migratio to iterate in Go instead.

[1] boltdb/bolt#357
[2] boltdb/bolt#268

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Tested the migration with 790k payments and it took about 40 seconds on my machine which isn't too bad at all!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there should be a default case that returns an error? Since there shouldn't be any other keys in this bucket anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, that's a good point. Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're slightly changing the logic here. Now we won't fail anymore if there isn't an attempt info key for a payment. But I guess that's probably okay?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Added back a santiy check to ensure that all attempts have an attempt info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use a read bucket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many attempts are there for those 100 million payments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand there's no hard upper bound, but certainly limited by the graph itself so we can say it's constant per payment? Also since we migrate per payment we won't pre-allocate for each attempt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gocha. I asked because I was curious about the 3 GiB memory usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wanted to ask was there the above bucket could be nil or not...If it could be it seems we shouldn't return an error below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all happens in one huge write transaction.
Yeah I was thinking we should do it in separate transactions, at least for the deletion.

Another idea is to limit how many records we do at once. Maybe we could update N payments each time, and perform this update several times?

@yyforyongyu
Copy link
Member

Tested the migration with 790k payments and it took about 40 seconds on my machine which isn't too bad at all!

What happens when the program exits during those 40 seconds? Will the migration continue again after restart?

@guggero
Copy link
Collaborator

guggero commented Aug 27, 2021

What happens when the program exits during those 40 seconds? Will the migration continue again after restart?

Since it's one large DB transaction (which is what makes this tricky memory wise for both bolt and remote DBs), it's only committed in the end. So if the process exits, the DB is rolled back and the full migration retries on next startup.

@bhandras bhandras force-pushed the flatten-htlc-bucket branch from 6fd0cab to f023631 Compare August 27, 2021 13:12
@bhandras bhandras requested a review from yyforyongyu August 27, 2021 13:14
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

More thoughts on this topic. I think the htlc attempts are only needed when the payment is not in a terminal state. Once the payment is done(fail/succeed), the extra bandwidth used to carry htlc attempts is unnecessary. And since the payment is usually completed in a short time(60 seconds default timeout iirc), most of the time, those htlc attempts could be avoided. It would be nice if we could somehow specify what to be queried in that one big db transaction, kinda like constructing SQL queries beforehand like the old-fashioned way. Of course this is some future work/discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gocha. I asked because I was curious about the 3 GiB memory usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice docs!

@bhandras bhandras force-pushed the flatten-htlc-bucket branch from f023631 to f2cc783 Compare September 7, 2021 08:47
@orijbot
Copy link

orijbot commented Sep 7, 2021

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just some small nits that can be cleaned up, but can be tacked onto another of the dependent PRs, want to keep this PR train moving along 🚄

// failure information, if any.
htlcFailInfoKey = []byte("htlc-fail-info")
// htlcFailInfoKey is the key used as the prefix of an HTLC attempt
// failure information, if any.The HTLC attempt ID is concatenated at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the prior here, also some extra spaces after The.

htlcBucket := bucket.NestedReadBucket(k)
attemptInfoCount := 0
err := bucket.ForEach(func(k, v []byte) error {
aid := byteOrder.Uint64(k[len(k)-8:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 seems to the the attempt ID itself, could be lifted to a constant somewhere above.

@Roasbeef Roasbeef merged commit 75f5b40 into lightningnetwork:master Sep 8, 2021
@odeke-em
Copy link

odeke-em commented Sep 8, 2021

Congrats @bhandras and well done! Indeed, to chime in your report of a 25% increase, this change showed all good improvements in the benchmarks per https://dashboard.github.orijtech.com/benchmark/8dcabf559b0b435c9a05c203224c9fdd
Screen Shot 2021-09-07 at 11 57 00 PM

Screen Shot 2021-09-07 at 11 58 46 PM

/cc @cuonglm @kirbyquerby @willpoint

Thanks @Roasbeef for adopting and using bencher :-) We plan on introducing better UX to improve the workflow and results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants