-
Notifications
You must be signed in to change notification settings - Fork 561
Description
Describe the bug
Recently, we discovered a potential issue where duplicate operations are being produced from Deli. These duplicate operations are handled in Scriptorium but not in Scribe. The logic in Scribe checks for incremental sequence numbers, which fails in the presence of duplicate operations. Consequently, Scribe marks the document as corrupted.
Reason for duplicate ops
Duplicate operations occurred because the Deli system failed during checkpointing but successfully produced data to Kafka. The checkpoint contains the last processed sequence number, essential for preventing the reprocessing of operations. Since the checkpointing failed, the last processed sequence number was not updated, causing Deli to reprocess operations that were already processed and produced to Kafka.
To Reproduce
Steps to reproduce the behavior:
- Deli service is crashed before checkpointing the state of document
- Which will result in reprocessing of ops and duplicate ops(deltas) are produced to kafka
- The document for which the duplicate ops were received is marked as corrupted
Expected behavior
Duplicate operations are always possible due to failure in checkpointing that cause reprocessing of operations. Therefore, it is essential for Scribe to have robust logic that can identify and handle these duplicate operations correctly. This would ensure that the document is not marked as corrupted erroneously. Implementing duplicate handling similar to Scriptorium would enhance the reliability and resilience of Service, preventing potential data corruption and ensuring consistent document processing.