-
-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Current Hashing Approach
Lodestar's architecture relies heavily on maintaining a full merkle tree of
the beacon state. We represent the tree as a linked data structure, where each node is 1. immutable, 2. lazily computing but caching the hash of its children.
This allows us to minimize the number of hashes we need to perform during state transition. Since all hashes of a prestate are maintained, only the paths thru the tree to the "diff" need to be re-hashed. This reduces the computational cost of hashing.
Also, it allows us perform structural sharing, sharing the memory for beacon states with shared subtrees. For example, between epochs in a sync period, where sync committees remain constant, if we maintain a reference to two beacon states, both states will share the same underlying subtrees for the sync committees. This reduces the memory cost of maintaining several related states.
Hashing Function
We use as-sha256 with several optimizations
-
we rely on hash inputs always being 64 bytes - this allows us to precompute part of the sha256 internals for a decent gain
-
we avoid allocations inside library, only using fixed input/output buffers
-
Related hashing perf analysis Lodestar tree hashing performance lodestar#2206
Hash Cache Representation
The memory usage for each type of object in Javascript is not very efficient coming from a systems language intuition. We store hash objects NOT as 32-byte Uint8Array. A 32-byte Uint8Array takes 223 total bytes! There's a bunch of pointers and additional bookkeeping that's being stored behind the scenes.
We store hash objects as objects with 8 uint32 numbers. eg: {h0: 0, h1: 0, ..., h7: 0}. This takes somewhere between 88 bytes and 216 bytes, depending on the sizes of the indiviudal component numbers. Smaller numbers are represented as Smi (small integer) as an immediate value, while larger numbers are stored on the heap. In practice, this happens TODO.
- Add digestObjects method as-sha256#58
- Research reducing memory footprint lodestar#2885
- Benchmark memory #219
How to improve?
The hashing speed in lodestar is quite low compared to other implementations.
- Investigate hashtree implementation, which hashes multiple hashes at once.
- Investigate using a systems language implementation, eg using napi-rs. This could be tried using popular libraries, which likely attempt some hardware acceleration. Also can be tried using a transliteration of as-sha256.
The memory of our hashes in a lodestar node constitute a lot of a running beacon node. And our memory usage, measured per-hash, is still very large compared to systems languages.
- Investigate using a systems language implementation, eg using napi-rs.
- Investigate using an alternative memory layout in javascript.
Results
Some experiments were made, results below
cayman/hash-object
- This branch has some experiments using napi-rs to store hash objects 'in rust' and using napi-rs for hashing
hashtreedoes exceptionally well operating on largeUint8Arrays, not as well on rustHashObjects. (seehashtree uint8arrayrow) (hashtreecode has since been pulled into a repo here:@chainsafe/hashtree-js)- Using rust libraries for hashing was slower (see
rustrow), also using a rust port of as-sha256 was slower (seerust object rs-sha256)
node -r ts-node/register node_modules/.bin/benchmark packages/ssz/test/perf/hash.test.ts
✓ hashing - hashtreeOne 1643.022 ops/s 608.6346 us/op - 45 runs 3.24 s
✓ hashing - hashtree 1564.410 ops/s 639.2188 us/op - 25 runs 2.13 s
✓ hashing - hashtree uint8array 46076.71 ops/s 21.70294 us/op - 468 runs 1.52 s
✓ hashing - rust 1789.679 ops/s 558.7596 us/op - 27 runs 2.03 s
✓ hashing - rust object rs-sha256 1438.692 ops/s 695.0758 us/op - 21 runs 2.01 s
✓ hashing - rust object as-sha256 550.4754 ops/s 1.816612 ms/op - 7 runs 1.87 s
✓ hashing - as-sha256 2276.654 ops/s 439.2412 us/op - 10 runs 0.959 s
Unfortunately, a rust HashObject is more expensive, memory-wise, than the status quo.
node -r ts-node/register --expose-gc packages/ssz/test/memory/hash.test.ts
hash-object - rust - 1 - 101.7 bytes / instance
hash-object - js - 1 - 91.8 bytes / instance
cayman/hash-cache
- This branch replaces hash objects ({h0, h1, ..., h7}) with a hash cache and hash ids (a Smi reference to a memory offset)
- Unfortunately is slower and heavier :(.
node -r ts-node/register node_modules/.bin/benchmark packages/ssz/test/perf/eth2/hashTreeRoot.test.ts
hash cache
✓ hashTreeRoot Attestation - struct 37929.07 ops/s 26.36500 us/op - 16753 runs 0.505 s
✓ hashTreeRoot Attestation - tree 46307.02 ops/s 21.59500 us/op - 22688 runs 0.912 s
✓ hashTreeRoot SignedAggregateAndProof - struct 25187.65 ops/s 39.70200 us/op - 13416 runs 0.606 s
✓ hashTreeRoot SignedAggregateAndProof - tree 30281.92 ops/s 33.02300 us/op - 14954 runs 1.01 s
✓ hashTreeRoot SyncCommitteeMessage - struct 96609.02 ops/s 10.35100 us/op - 72446 runs 0.808 s
✓ hashTreeRoot SyncCommitteeMessage - tree 136072.9 ops/s 7.349000 us/op - 43560 runs 0.611 s
✓ hashTreeRoot SignedContributionAndProof - struct 35928.57 ops/s 27.83300 us/op - 26315 runs 0.808 s
✓ hashTreeRoot SignedContributionAndProof - tree 45607.95 ops/s 21.92600 us/op - 12233 runs 0.505 s
✓ hashTreeRoot SignedBeaconBlock - struct 374.1080 ops/s 2.673025 ms/op - 223 runs 1.13 s
✓ hashTreeRoot SignedBeaconBlock - tree 510.8943 ops/s 1.957352 ms/op - 82 runs 1.42 s
✓ hashTreeRoot Validator - struct 58159.82 ops/s 17.19400 us/op - 43131 runs 0.808 s
✓ hashTreeRoot Validator - tree 54773.51 ops/s 18.25700 us/op - 54961 runs 1.11 s
master
✓ hashTreeRoot Attestation - struct 37378.99 ops/s 26.75300 us/op - 16472 runs 0.505 s
✓ hashTreeRoot Attestation - tree 48546.05 ops/s 20.59900 us/op - 14022 runs 0.404 s
✓ hashTreeRoot SignedAggregateAndProof - struct 24892.96 ops/s 40.17200 us/op - 11015 runs 0.505 s
✓ hashTreeRoot SignedAggregateAndProof - tree 32836.41 ops/s 30.45400 us/op - 10084 runs 0.404 s
✓ hashTreeRoot SyncCommitteeMessage - struct 102343.7 ops/s 9.771000 us/op - 56987 runs 0.606 s
✓ hashTreeRoot SyncCommitteeMessage - tree 148104.3 ops/s 6.752000 us/op - 42104 runs 0.404 s
✓ hashTreeRoot SignedContributionAndProof - struct 36937.17 ops/s 27.07300 us/op - 9497 runs 0.303 s
✓ hashTreeRoot SignedContributionAndProof - tree 47938.64 ops/s 20.86000 us/op - 11127 runs 0.303 s
✓ hashTreeRoot SignedBeaconBlock - struct 383.4154 ops/s 2.608137 ms/op - 268 runs 1.24 s
✓ hashTreeRoot SignedBeaconBlock - tree 530.1876 ops/s 1.886125 ms/op - 283 runs 1.20 s
✓ hashTreeRoot Validator - struct 59245.22 ops/s 16.87900 us/op - 78104 runs 1.41 s
✓ hashTreeRoot Validator - tree 85623.77 ops/s 11.67900 us/op - 55251 runs 0.707 s
node -r ts-node/register --expose-gc packages/ssz/test/memory/eth2Objects.test.ts
hash cache
Attestation struct - 1421.0 bytes / instance
Attestation tree - 5272.3 bytes / instance
SignedAggregateAndProof struct - 2077.4 bytes / instance
SignedAggregateAndProof tree - 8099.1 bytes / instance
AggregationBits struct - 255.9 bytes / instance
AggregationBits tree - 1383.0 bytes / instance
SignedBeaconBlockPhase0 struct - 131438.2 bytes / instance
SignedBeaconBlockPhase0 tree - 484786.4 bytes / instance
master
Attestation struct - 1421.2 bytes / instance
Attestation tree - 3028.2 bytes / instance
SignedAggregateAndProof struct - 2077.7 bytes / instance
SignedAggregateAndProof tree - 4679.8 bytes / instance
AggregationBits struct - 265.5 bytes / instance
AggregationBits tree - 941.2 bytes / instance
SignedBeaconBlockPhase0 struct - 131435.5 bytes / instance
SignedBeaconBlockPhase0 tree - 277766.9 bytes / instance
cayman/napi-merkle-node
- This branch pulls "BranchNode" and "LeafNode" into rust and exposes a napi wrapper
Nodeto javascript. This design allows most of the tree to exist in rust, with napi pointers into the tree as navigation demands. - Current testing shows this to be extremely slow, with the beacon node unable to stay synced. Have not had time to diagnose.