Update toolchain #11

kocolosk · 2021-11-26T19:12:10Z

Following the pattern from b64url

* update an incomplete type spec for khash:del/2 * use assertEqual instead of assertMatch to silence a spurious warning * khash:keys/1 was never implemented; use khash:fold/3 in the test suite instead and enable the missing test

nickva · 2021-11-26T19:17:51Z

A thought I had is that we had created khash before maps were available. With optimized maps available since OTP 18, does khash still perform better than maps? Perhaps the easiest here might be to switch to maps and have one less C dependency.

kocolosk · 2021-11-26T19:48:20Z

Oh, good idea! Better to not need the extra code at all.

kocolosk · 2021-11-27T15:31:04Z

I wrote a short Escript to benchmark khash against Erlang maps and attached it to this PR. It's not my finest demonstration of scientific rigor, but it does provide some guidance. General observations:

Maps are nearly always faster than khash for lookups, sometimes significantly so
Maps are typically faster at inserts when the total dataset is ~100 MB or smaller

Larger datasets exhibit a fair amount of noise; I expect this is largely due to garbage collection. Using maps instead of khash instances inside e.g. the ddoc_cache_lru gen_server might introduce some undesirable GC pauses for environments where the cache size is quite large. I'm not sure how a map-of-maps would behave from a GC perspective.

On the topic of LRUs, I noticed that we have different implements for the authentication cache (ets_lru) and the ddoc cache (which replaced ets_lru with the current implementation in apache/couchdb@939761b7183). Meanwhile, the Erlang Solutions folks seem to have started a new project to create a high-performance caching library at esl/segmented_cache that plays tricks with new OTP features like atomics. It seems like it'd be nice to arrive at a common solution over time there.

kocolosk · 2021-11-28T02:16:44Z

I was thinking about this a bit more in between various college football games today and remembered the obvious answer for large maps: just use ETS.

I also went back and reviewed the two places where we are using khash in the CouchDB codebase today. The first is the couch_event application, which is responsible for notifying processes about updates that happen to DBs of interest. It mantains Listener -> Db and Db -> Listener mappings in separate khash instances. This would seem to be an ideal case to replace with maps: I believe the churn rate for listeners is relatively low, while lookups can be quite frequent.

The other current use case is in the ddoc_cache application. This one is a bit more subtle. There are two mappings:

Pids: khash(Cache Entry Manager Pid -> Cache Key)
Dbs: khash(DbName -> khash(DDocId -> khash(Cache Key -> Cache Entry Manager Pid)))

That second mapping showcases a nice property of the khash NIF, namely that nesting the hashes allows one to update a "leaf" hash without needing to update the parent hash. Nested maps would have rather different runtime characteristics. Given that I wouldn't feel confident about refactoring ddoc_cache to use maps internally today.

kocolosk added 6 commits November 17, 2021 10:23

Add a dev container config

8399dee

Add rebar3 compatibility

383aaa4

Align to EUnit naming standards

3873fa0

Format code with erlfmt

791091c

Fix issues uncovered by dialyzer

8fb3d80

* update an incomplete type spec for khash:del/2 * use assertEqual instead of assertMatch to silence a spurious warning * khash:keys/1 was never implemented; use khash:fold/3 in the test suite instead and enable the missing test

Add branch protection and switch to GHA CI

460c5e1

kocolosk force-pushed the update-toolchain branch from 1134d86 to 460c5e1 Compare November 26, 2021 19:16

kocolosk added 2 commits November 27, 2021 14:34

Add a naive benchmark comparing khash to maps

a7aa31b

Fix asf.yaml indentation

f318cc2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update toolchain #11

Update toolchain #11

Uh oh!

kocolosk commented Nov 26, 2021

Uh oh!

nickva commented Nov 26, 2021

Uh oh!

kocolosk commented Nov 26, 2021

Uh oh!

kocolosk commented Nov 27, 2021

Uh oh!

kocolosk commented Nov 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update toolchain #11

Are you sure you want to change the base?

Update toolchain #11

Uh oh!

Conversation

kocolosk commented Nov 26, 2021

Uh oh!

nickva commented Nov 26, 2021

Uh oh!

kocolosk commented Nov 26, 2021

Uh oh!

kocolosk commented Nov 27, 2021

Uh oh!

kocolosk commented Nov 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants