Fix/faketime race condition in header verifier tests#5108
Fix/faketime race condition in header verifier tests#510815168316096 wants to merge 4 commits intonervosnetwork:developfrom
Conversation
| ci-${{ runner.os }}-cargo-regression- | ||
| - name: Run verification regression tests | ||
| run: | | ||
| cargo nextest run --locked -p ckb-verification --verbose |
There was a problem hiding this comment.
our current make test don't invokes this test?
There was a problem hiding this comment.
Yes, this file is irrelevant to this PR. I will revise the description of this PR, including the failed test cases and the purpose of the fix. I will remove the content and commits that are not related to the PR.
The background is as follows: I added test cases in this branch and found that the newly added cases failed when running them (see https://github.com/15168316096/ckb/blob/copilot/split-modules-and-add-workflows/verification/src/tests/header_verifier.rs#L41). I then checked out this fix branch, whose purpose is to fix the mutex lock for FAKETIME_LOCK.
| ci-${{ runner.os }}-cargo-regression- | ||
| - name: Run tx-pool regression tests | ||
| run: | | ||
| cargo nextest run --locked -p ckb-tx-pool --verbose |
There was a problem hiding this comment.
I believe make test will run all these tests already.
… condition Add 11 new test cases for block, header, and transaction verifiers: Block verifier (5 tests): - test_block_with_only_cellbase - test_proposals_hash_mismatch - test_cellbase_with_type_script - test_block_no_duplicated_transactions - test_block_no_duplicated_proposals Header verifier (2 tests): - test_timestamp_at_boundary - test_correct_number Transaction verifier (4 tests): - test_empty_outputs - test_non_empty_transaction - test_valid_version - test_transaction_size_within_limit Fix a race condition in header verifier timestamp tests: `ckb_systemtime::faketime()` manipulates process-wide global statics (`FAKETIME` / `FAKETIME_ENABLED`). When a `FaketimeGuard` is dropped it disables faketime for the entire process, causing flaky failures when multiple timestamp tests run in parallel. Introduce a module-level `FAKETIME_LOCK` mutex to serialize these tests. Co-authored-by: Cursor <cursoragent@cursor.com>
8e0da44 to
38976f1
Compare
|
I have simplified the PR description and submitted the corresponding fix along with the unit test cases that support this fix. |
| // `ckb_systemtime::faketime()` manipulates global statics (`FAKETIME` / `FAKETIME_ENABLED`). | ||
| // When a `FaketimeGuard` is dropped it disables faketime **process-wide**, which causes | ||
| // data races when multiple timestamp tests run in parallel. Serialise them with a mutex. | ||
| static FAKETIME_LOCK: Mutex<()> = Mutex::new(()); |
There was a problem hiding this comment.
should we move this lock into ckb_systemtime? maybe something like:
pub fn faketime() -> FaketimeGuard {
let lock = FAKETIME_MUTEX.lock().expect("FAKETIME_MUTEX poisoned");
FaketimeGuard { _lock: lock }
}then all the api used ckb_systemtime don't need to change?
There was a problem hiding this comment.
should we move this lock into ckb_systemtime? maybe something like:
pub fn faketime() -> FaketimeGuard { let lock = FAKETIME_MUTEX.lock().expect("FAKETIME_MUTEX poisoned"); FaketimeGuard { _lock: lock } }then all the api used ckb_systemtime don't need to change?
Thanks for the suggestion! I've moved the mutex lock into ckb_systemtime so that faketime() now automatically acquires a process-wide lock:
static FAKETIME_MUTEX: Mutex<()> = Mutex::new(());
pub fn faketime() -> FaketimeGuard {
let lock = FAKETIME_MUTEX.lock().expect("FAKETIME_MUTEX poisoned");
FaketimeGuard { _lock: lock }
}
Changes:
- util/systemtime/src/lib.rs: Embedded MutexGuard into FaketimeGuard, lock acquired automatically in faketime()
- verification/src/tests/header_verifier.rs: Removed the test-side FAKETIME_LOCK and all _lock lines — no longer needed
- sync/src/tests/types.rs: Fixed ttl_filter test which called faketime() twice in the same scope (would deadlock with the new non-reentrant mutex) — now calls it once and uses set_faketime() to change the value
All other faketime() call sites only call it once per scope, so no changes needed.
There was a problem hiding this comment.
. If multiple timestamp tests run in parallel,
No, ckb use cargo-nextest to run unit test:https://nexte.st/ Each test is executed in a separate process.
Running with --test-threads=1
Yes. Using --test-threads=1 is a valid workaround for cargo test, but running with only one thread is too slow. That’s exactly why I chose cargo-nextest for ckb.
I recommend running ckb’s unit tests using make test rather than cargo test.
❯ make -n test
cargo nextest run --features deadlock_detection,with_sentry --workspace --no-fail-fast --hide-progress-bar --success-output immediate-final --failure-output immediate-final --run-ignored all|
if so we can keep the new tests and don't keep the change for faketime? |
sure |
Developers supplement the single test. If you want to run a single case, you are more used to using cargo test. You will encounter this problem. So how to guide the developer to use make -n test to cover and solve this scenario? I understand that the purpose of the change is to run a single test. Cargo test can be more common. |
What problem does this PR solve?
Problem Summary:
While adding supplementary verification tests on the branch, the existing test
test_timestamp_too_oldstarted failing intermittently when run in parallel with the newly addedtest_timestamp_at_boundary.Root cause:
ckb_systemtime::faketime()manipulates process-wide global statics (FAKETIME/FAKETIME_ENABLED). When aFaketimeGuardis dropped, it disables faketime for the entire process. If multiple timestamp tests run in parallel, one test's guard drop can disable faketime while another test is still relying on it — causingunix_time_as_millis()to return real system time instead of the expected fake value, which makes the verifier produce an unexpectedOk(())result.Running with
--test-threads=1consistently passes, confirming this is a concurrency issue.What is changed and how it works?
What's Changed:
Introduce a module-level
FAKETIME_LOCKmutex inheader_verifier.rsthat all four timestamp-related tests acquire before touching faketime, ensuring mutual exclusion without adding any external dependencies.Add 11 supplementary verification test cases across 3 files:
Block verifier (5 tests):
test_block_with_only_cellbase— verify a block with only a cellbase transactiontest_proposals_hash_mismatch— verify proposals hash mismatch is rejectedtest_cellbase_with_type_script— verify cellbase with type script is rejectedtest_block_no_duplicated_transactions— verify block with unique transactions passestest_block_no_duplicated_proposals— verify block with unique proposals passesHeader verifier (2 tests):
test_timestamp_at_boundary— verify timestamp exactly at allowed future blocktime boundary passestest_correct_number— verify correct block number passesTransaction verifier (4 tests):
test_empty_outputs— verify transaction with empty outputs is rejectedtest_non_empty_transaction— verify a valid non-empty transaction passestest_valid_version— verify valid transaction version passestest_transaction_size_within_limit— verify transaction within size limit passesRelated changes
Check List
Tests
Side effects