Skip to content
This repository was archived by the owner on Sep 19, 2019. It is now read-only.

Conversation

@ksnavely
Copy link

Here I implement a new fabric module for calculating a database's quality factor. The quality factor is an upper bound of missing updates between shard replicas for all shard ranges in the database. The current logic is in snippet.

I looked at other fabric modules for examples and spent time flipping between chttpd, fabric, and rexi to make sure I had my bearings. However, I'm sure this needs more updating. I've opened this PR against a working branch so I can hopefully get some comments and refine this, and maybe we can eventually merge it in along with some chttpd work.

@chewbranca @kocolosk @rnewson @davisp

Verbose %% comments on for great learning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unconventional to call back into the coordinating module but that's OK.

@kocolosk
Copy link
Contributor

Nice first pass @ksnavely.

I can, however, think of a significantly more efficient implementation which eliminates a lot of the extra RPC traffic and sets us up to be able to do more sophisticated processing over time. The key to this implementation would be to avoid trying to compute the QF in the RPC worker directly and instead transmit all the "raw materials" back to the coordinator for, y'know, coordination. Let's think about the bits of data that are needed from each shard here:

  • Current update sequence
  • Current UUID
  • Checkpoint sequences for all relevant internal replications

Figuring out the list of "relevant" internal replications can be a bit tricky. In your current approach you're asking for all the UUIDs of the peer replicas of the shard and then pulling up the checkpoint docs directly. An alternative that might be worth exploring is to actually just walk the local docs tree (using something like couch_btree:foldl(Db#db.local_tree, Fun, Acc) and pull up all the internal replication docs stored on that shard (those prefixed by _local/shard-sync). Then the data returned back from the worker could be a proplist that looks something like

[
  {current_seq, Sequence},
  {uuid, UUID},
  {checkpoints, [Doc1, Doc2, ... DocN]}
]

The checkpoint docs are supposed to be pretty compact but if this turned out to be too large a payload we could strip out just the necessary bits of the checkpoint and return those. Then the coordinator would have all the information necessary to do whatever computation we needed in local memory. We might even consider exposing all of that information as an admin API endpoint and pushing the local out to Python or JavaScript where we've got better libraries for manipulation and visualization.

I realize I'm throwing a lot at you here but I think this is a place where we can make a pretty nice improvement and where we currently have pretty poor visibility.

@kocolosk
Copy link
Contributor

Oh, the other bit I wanted to comment on is that a difference of sequences is not an immediately meaningful value for users. I think in 99% of cases the number of documents that are not up-to-date is the relevant quantity. We can compute this using couch_db:count_changes_since(Db, Sequence). We could include that in the detail returned by the RPC worker, e.g. for each checkpoint doc where the source is the shard we're operating on we can grab the <<"seq">> value from the doc and compute the number of modified documents since that sequence.

@kocolosk
Copy link
Contributor

Because I had some extra free time this afternoon, here's a stab at an implementation of what I'm talking about for the RPC worker:

get_quality_factor_data(Db) ->
    #db{update_seq = CurrentSeq, local_tree = Bt} = Db,
    UUID = couch_db:get_uuid(Db),

    % Compute the checkpoint ID for replications where we are the source
    % size(<<"_local/shard-sync-base64(md5(UUID))">>) == 40
    <<MyPrefix:40/binary, _/binary>> = mem3_rep:make_local_id(UUID, nil),

    % Function to grab all checkpoint docs where we are the source, and for
    % each doc compute the number of documents in the database that have been
    % modified since the checkpointed sequence.
    Fun = fun({Id, {_Rev, {Body}}}, _Offset, Acc) ->
        case Id of <<MyPrefix:40/binary, _/binary>> ->                
            CheckpointSeq = couch_util:get_value(<<"seq">>, Body, 0),
            ModifiedDocCount = couch_db:count_changes_since(Db, CheckpointSeq),
            NewBody = {[
                {<<"_id">>, Id},
                {<<"docs_remaining">>, ModifiedDocCount} |
                Body
            ]},
            {ok, [NewBody | Acc]};
        _Else ->
            % We've reached a checkpoint doc with a different source node
            {stop, Acc}
        end
    end,

    % Execute the function above against the local docs btree. The combination
    % of start_key in the Options list and the pattern match for MyPrefix in
    % the function ensures that we only grab documents where we are the source
    {ok, _, AccOut} = couch_btree:foldl(Bt, Fun, [], [{start_key, MyPrefix}]),

    InfoList = [
        {uuid, UUID},
        {update_seq, CurrentSeq},
        {checkpoints, AccOut}
    ],
    {ok, InfoList}.

In this code I specifically avoided any checkpoint docs that concerns sources other than our current shard file. I think this is OK for our purposes. One could make the argument to return all of them because as far as the internal replicator is concerned a checkpoint sequence is only valid if both source and target agree on the sequence number. Here we're throwing away any information on whether the target agrees with us, the source.

Like I said, I think it's OK for the purposes of this exercise. In very rare scenarios it could cause us to underestimate the QF temporarily, but the current implementation has this challenge as well.

@ksnavely
Copy link
Author

@kocolosk TYVM for the comments! I knew I was going to end up with an under-performant implementation the first time around -- most of the algorithm I yanked verbatim from snippet. The best part of the hackday (for me at least) was just sitting down with fabric and rexi, reading the source, and coming away with a better understanding of that part of the system.

@kocolosk
Copy link
Contributor

Sure, makes a ton of sense. Hopefully we're "speaking the same language" now 😃

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants