Skip to content

Investigating Tree Hashing in Lodestar #355

@wemeetagain

Description

@wemeetagain

Current Hashing Approach

Lodestar's architecture relies heavily on maintaining a full merkle tree of
the beacon state. We represent the tree as a linked data structure, where each node is 1. immutable, 2. lazily computing but caching the hash of its children.

This allows us to minimize the number of hashes we need to perform during state transition. Since all hashes of a prestate are maintained, only the paths thru the tree to the "diff" need to be re-hashed. This reduces the computational cost of hashing.

Also, it allows us perform structural sharing, sharing the memory for beacon states with shared subtrees. For example, between epochs in a sync period, where sync committees remain constant, if we maintain a reference to two beacon states, both states will share the same underlying subtrees for the sync committees. This reduces the memory cost of maintaining several related states.

Hashing Function

We use as-sha256 with several optimizations

Hash Cache Representation

The memory usage for each type of object in Javascript is not very efficient coming from a systems language intuition. We store hash objects NOT as 32-byte Uint8Array. A 32-byte Uint8Array takes 223 total bytes! There's a bunch of pointers and additional bookkeeping that's being stored behind the scenes.
We store hash objects as objects with 8 uint32 numbers. eg: {h0: 0, h1: 0, ..., h7: 0}. This takes somewhere between 88 bytes and 216 bytes, depending on the sizes of the indiviudal component numbers. Smaller numbers are represented as Smi (small integer) as an immediate value, while larger numbers are stored on the heap. In practice, this happens TODO.

How to improve?

The hashing speed in lodestar is quite low compared to other implementations.

  • Investigate hashtree implementation, which hashes multiple hashes at once.
  • Investigate using a systems language implementation, eg using napi-rs. This could be tried using popular libraries, which likely attempt some hardware acceleration. Also can be tried using a transliteration of as-sha256.

The memory of our hashes in a lodestar node constitute a lot of a running beacon node. And our memory usage, measured per-hash, is still very large compared to systems languages.

  • Investigate using a systems language implementation, eg using napi-rs.
  • Investigate using an alternative memory layout in javascript.

Results

Some experiments were made, results below

cayman/hash-object

  • This branch has some experiments using napi-rs to store hash objects 'in rust' and using napi-rs for hashing
  • hashtree does exceptionally well operating on large Uint8Arrays, not as well on rust HashObjects. (see hashtree uint8array row) (hashtree code has since been pulled into a repo here: @chainsafe/hashtree-js)
  • Using rust libraries for hashing was slower (see rust row), also using a rust port of as-sha256 was slower (see rust object rs-sha256)

node -r ts-node/register node_modules/.bin/benchmark packages/ssz/test/perf/hash.test.ts

    ✓ hashing - hashtreeOne                                               1643.022 ops/s    608.6346 us/op        -         45 runs   3.24 s
    ✓ hashing - hashtree                                                  1564.410 ops/s    639.2188 us/op        -         25 runs   2.13 s
    ✓ hashing - hashtree uint8array                                       46076.71 ops/s    21.70294 us/op        -        468 runs   1.52 s
    ✓ hashing - rust                                                      1789.679 ops/s    558.7596 us/op        -         27 runs   2.03 s
    ✓ hashing - rust object rs-sha256                                     1438.692 ops/s    695.0758 us/op        -         21 runs   2.01 s
    ✓ hashing - rust object as-sha256                                     550.4754 ops/s    1.816612 ms/op        -          7 runs   1.87 s
    ✓ hashing - as-sha256                                                 2276.654 ops/s    439.2412 us/op        -         10 runs  0.959 s

Unfortunately, a rust HashObject is more expensive, memory-wise, than the status quo.

node -r ts-node/register --expose-gc packages/ssz/test/memory/hash.test.ts

hash-object - rust - 1                   - 101.7 bytes / instance
hash-object - js - 1                     - 91.8 bytes / instance

cayman/hash-cache

  • This branch replaces hash objects ({h0, h1, ..., h7}) with a hash cache and hash ids (a Smi reference to a memory offset)
  • Unfortunately is slower and heavier :(.

node -r ts-node/register node_modules/.bin/benchmark packages/ssz/test/perf/eth2/hashTreeRoot.test.ts

hash cache

    ✓ hashTreeRoot Attestation - struct                                   37929.07 ops/s    26.36500 us/op        -      16753 runs  0.505 s
    ✓ hashTreeRoot Attestation - tree                                     46307.02 ops/s    21.59500 us/op        -      22688 runs  0.912 s
    ✓ hashTreeRoot SignedAggregateAndProof - struct                       25187.65 ops/s    39.70200 us/op        -      13416 runs  0.606 s
    ✓ hashTreeRoot SignedAggregateAndProof - tree                         30281.92 ops/s    33.02300 us/op        -      14954 runs   1.01 s
    ✓ hashTreeRoot SyncCommitteeMessage - struct                          96609.02 ops/s    10.35100 us/op        -      72446 runs  0.808 s
    ✓ hashTreeRoot SyncCommitteeMessage - tree                            136072.9 ops/s    7.349000 us/op        -      43560 runs  0.611 s
    ✓ hashTreeRoot SignedContributionAndProof - struct                    35928.57 ops/s    27.83300 us/op        -      26315 runs  0.808 s
    ✓ hashTreeRoot SignedContributionAndProof - tree                      45607.95 ops/s    21.92600 us/op        -      12233 runs  0.505 s
    ✓ hashTreeRoot SignedBeaconBlock - struct                             374.1080 ops/s    2.673025 ms/op        -        223 runs   1.13 s
    ✓ hashTreeRoot SignedBeaconBlock - tree                               510.8943 ops/s    1.957352 ms/op        -         82 runs   1.42 s
    ✓ hashTreeRoot Validator - struct                                     58159.82 ops/s    17.19400 us/op        -      43131 runs  0.808 s
    ✓ hashTreeRoot Validator - tree                                       54773.51 ops/s    18.25700 us/op        -      54961 runs   1.11 s

master

    ✓ hashTreeRoot Attestation - struct                                   37378.99 ops/s    26.75300 us/op        -      16472 runs  0.505 s
    ✓ hashTreeRoot Attestation - tree                                     48546.05 ops/s    20.59900 us/op        -      14022 runs  0.404 s
    ✓ hashTreeRoot SignedAggregateAndProof - struct                       24892.96 ops/s    40.17200 us/op        -      11015 runs  0.505 s
    ✓ hashTreeRoot SignedAggregateAndProof - tree                         32836.41 ops/s    30.45400 us/op        -      10084 runs  0.404 s
    ✓ hashTreeRoot SyncCommitteeMessage - struct                          102343.7 ops/s    9.771000 us/op        -      56987 runs  0.606 s
    ✓ hashTreeRoot SyncCommitteeMessage - tree                            148104.3 ops/s    6.752000 us/op        -      42104 runs  0.404 s
    ✓ hashTreeRoot SignedContributionAndProof - struct                    36937.17 ops/s    27.07300 us/op        -       9497 runs  0.303 s
    ✓ hashTreeRoot SignedContributionAndProof - tree                      47938.64 ops/s    20.86000 us/op        -      11127 runs  0.303 s
    ✓ hashTreeRoot SignedBeaconBlock - struct                             383.4154 ops/s    2.608137 ms/op        -        268 runs   1.24 s
    ✓ hashTreeRoot SignedBeaconBlock - tree                               530.1876 ops/s    1.886125 ms/op        -        283 runs   1.20 s
    ✓ hashTreeRoot Validator - struct                                     59245.22 ops/s    16.87900 us/op        -      78104 runs   1.41 s
    ✓ hashTreeRoot Validator - tree                                       85623.77 ops/s    11.67900 us/op        -      55251 runs  0.707 s

node -r ts-node/register --expose-gc packages/ssz/test/memory/eth2Objects.test.ts

hash cache

Attestation struct             - 1421.0 bytes / instance
Attestation tree               - 5272.3 bytes / instance
SignedAggregateAndProof struct - 2077.4 bytes / instance
SignedAggregateAndProof tree   - 8099.1 bytes / instance
AggregationBits struct         - 255.9 bytes / instance
AggregationBits tree           - 1383.0 bytes / instance
SignedBeaconBlockPhase0 struct - 131438.2 bytes / instance
SignedBeaconBlockPhase0 tree   - 484786.4 bytes / instance

master

Attestation struct             - 1421.2 bytes / instance
Attestation tree               - 3028.2 bytes / instance
SignedAggregateAndProof struct - 2077.7 bytes / instance
SignedAggregateAndProof tree   - 4679.8 bytes / instance
AggregationBits struct         - 265.5 bytes / instance
AggregationBits tree           - 941.2 bytes / instance
SignedBeaconBlockPhase0 struct - 131435.5 bytes / instance
SignedBeaconBlockPhase0 tree   - 277766.9 bytes / instance

cayman/napi-merkle-node

  • This branch pulls "BranchNode" and "LeafNode" into rust and exposes a napi wrapper Node to javascript. This design allows most of the tree to exist in rust, with napi pointers into the tree as navigation demands.
  • Current testing shows this to be extremely slow, with the beacon node unable to stay synced. Have not had time to diagnose.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions