-
Notifications
You must be signed in to change notification settings - Fork 56
Description
The Viewstamped Replication Revisited paper in Section 4.2 requires that:
When a replica recovers after a crash it cannot participate in request processing and view changes until it has a state at least as recent as when it failed. If it could participate sooner than this, the system can fail. For example, if it forgets that it prepared some operation, this operation might then be known to fewer than a quorum of replicas even though it committed, which could cause the operation to be forgotten in a view change.
However, I believe there may be a bug in https://github.com/UWSysLab/tapir/blob/master/replication/vr/replica.cc#L833-L835 where a replica in recovery status is allowed by the implementation to participate in a higher view change, leading to data loss.
I found this while working on TigerBeetle's implementation of Viewstamped Replication, as I was doing a survey of existing implementations. By the way, Tapir's implementation of VSR is really nice and clean.
On a similar note, if anyone is interested, we just launched a $20k consensus challenge over at https://github.com/coilhq/viewstamped-replication-made-famous, where if you can find a correctness bug in an implementation of VSR you could earn bounties of up to $3,000.
The live launch event on Saturday also featured special interviews with Brian Oki and James Cowling, if you're a fan of the pioneering protocol and would like to take a watch: https://www.youtube.com/watch?v=_Jlikdtm4OA