Skip to content

Fix/faketime race condition in header verifier tests#5108

Open
15168316096 wants to merge 4 commits intonervosnetwork:developfrom
15168316096:fix/faketime-race-condition-in-header-verifier-tests
Open

Fix/faketime race condition in header verifier tests#5108
15168316096 wants to merge 4 commits intonervosnetwork:developfrom
15168316096:fix/faketime-race-condition-in-header-verifier-tests

Conversation

@15168316096
Copy link
Contributor

@15168316096 15168316096 commented Feb 11, 2026

What problem does this PR solve?

Problem Summary:

While adding supplementary verification tests on the branch, the existing test test_timestamp_too_old started failing intermittently when run in parallel with the newly added test_timestamp_at_boundary.

Root cause: ckb_systemtime::faketime() manipulates process-wide global statics (FAKETIME / FAKETIME_ENABLED). When a FaketimeGuard is dropped, it disables faketime for the entire process. If multiple timestamp tests run in parallel, one test's guard drop can disable faketime while another test is still relying on it — causing unix_time_as_millis() to return real system time instead of the expected fake value, which makes the verifier produce an unexpected Ok(()) result.

Running with --test-threads=1 consistently passes, confirming this is a concurrency issue.

What is changed and how it works?

What's Changed:

Introduce a module-level FAKETIME_LOCK mutex in header_verifier.rs that all four timestamp-related tests acquire before touching faketime, ensuring mutual exclusion without adding any external dependencies.

Add 11 supplementary verification test cases across 3 files:

Block verifier (5 tests):

  • test_block_with_only_cellbase — verify a block with only a cellbase transaction
  • test_proposals_hash_mismatch — verify proposals hash mismatch is rejected
  • test_cellbase_with_type_script — verify cellbase with type script is rejected
  • test_block_no_duplicated_transactions — verify block with unique transactions passes
  • test_block_no_duplicated_proposals — verify block with unique proposals passes

Header verifier (2 tests):

  • test_timestamp_at_boundary — verify timestamp exactly at allowed future blocktime boundary passes
  • test_correct_number — verify correct block number passes

Transaction verifier (4 tests):

  • test_empty_outputs — verify transaction with empty outputs is rejected
  • test_non_empty_transaction — verify a valid non-empty transaction passes
  • test_valid_version — verify valid transaction version passes
  • test_transaction_size_within_limit — verify transaction within size limit passes

Related changes

  • No related PRs
  • No need to cherry-pick

Check List

Tests

  • Unit tests added for block, header, and transaction verifiers
  • All 11 new tests pass; faketime race condition fix verified by running header verifier tests 5 times in parallel mode with 0 failures

Side effects

  • No performance regression
  • No breaking backward compatibility

@15168316096 15168316096 requested a review from a team as a code owner February 11, 2026 09:11
@15168316096 15168316096 requested review from quake and removed request for a team February 11, 2026 09:11
ci-${{ runner.os }}-cargo-regression-
- name: Run verification regression tests
run: |
cargo nextest run --locked -p ckb-verification --verbose
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our current make test don't invokes this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this file is irrelevant to this PR. I will revise the description of this PR, including the failed test cases and the purpose of the fix. I will remove the content and commits that are not related to the PR.

The background is as follows: I added test cases in this branch and found that the newly added cases failed when running them (see https://github.com/15168316096/ckb/blob/copilot/split-modules-and-add-workflows/verification/src/tests/header_verifier.rs#L41). I then checked out this fix branch, whose purpose is to fix the mutex lock for FAKETIME_LOCK.

ci-${{ runner.os }}-cargo-regression-
- name: Run tx-pool regression tests
run: |
cargo nextest run --locked -p ckb-tx-pool --verbose
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe make test will run all these tests already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,i agree it

… condition

Add 11 new test cases for block, header, and transaction verifiers:

Block verifier (5 tests):
- test_block_with_only_cellbase
- test_proposals_hash_mismatch
- test_cellbase_with_type_script
- test_block_no_duplicated_transactions
- test_block_no_duplicated_proposals

Header verifier (2 tests):
- test_timestamp_at_boundary
- test_correct_number

Transaction verifier (4 tests):
- test_empty_outputs
- test_non_empty_transaction
- test_valid_version
- test_transaction_size_within_limit

Fix a race condition in header verifier timestamp tests:
`ckb_systemtime::faketime()` manipulates process-wide global statics
(`FAKETIME` / `FAKETIME_ENABLED`). When a `FaketimeGuard` is dropped it
disables faketime for the entire process, causing flaky failures when
multiple timestamp tests run in parallel. Introduce a module-level
`FAKETIME_LOCK` mutex to serialize these tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@15168316096 15168316096 force-pushed the fix/faketime-race-condition-in-header-verifier-tests branch from 8e0da44 to 38976f1 Compare February 11, 2026 15:50
@15168316096
Copy link
Contributor Author

I have simplified the PR description and submitted the corresponding fix along with the unit test cases that support this fix.

// `ckb_systemtime::faketime()` manipulates global statics (`FAKETIME` / `FAKETIME_ENABLED`).
// When a `FaketimeGuard` is dropped it disables faketime **process-wide**, which causes
// data races when multiple timestamp tests run in parallel. Serialise them with a mutex.
static FAKETIME_LOCK: Mutex<()> = Mutex::new(());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this lock into ckb_systemtime? maybe something like:

pub fn faketime() -> FaketimeGuard {
    let lock = FAKETIME_MUTEX.lock().expect("FAKETIME_MUTEX poisoned");
    FaketimeGuard { _lock: lock }
}

then all the api used ckb_systemtime don't need to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,i agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this lock into ckb_systemtime? maybe something like:

pub fn faketime() -> FaketimeGuard {
    let lock = FAKETIME_MUTEX.lock().expect("FAKETIME_MUTEX poisoned");
    FaketimeGuard { _lock: lock }
}

then all the api used ckb_systemtime don't need to change?

Thanks for the suggestion! I've moved the mutex lock into ckb_systemtime so that faketime() now automatically acquires a process-wide lock:

static FAKETIME_MUTEX: Mutex<()> = Mutex::new(());

pub fn faketime() -> FaketimeGuard {
    let lock = FAKETIME_MUTEX.lock().expect("FAKETIME_MUTEX poisoned");
    FaketimeGuard { _lock: lock }
}

Changes:

  • util/systemtime/src/lib.rs: Embedded MutexGuard into FaketimeGuard, lock acquired automatically in faketime()
  • verification/src/tests/header_verifier.rs: Removed the test-side FAKETIME_LOCK and all _lock lines — no longer needed
  • sync/src/tests/types.rs: Fixed ttl_filter test which called faketime() twice in the same scope (would deadlock with the new non-reentrant mutex) — now calls it once and uses set_faketime() to change the value

All other faketime() call sites only call it once per scope, so no changes needed.

Copy link
Collaborator

@eval-exec eval-exec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. If multiple timestamp tests run in parallel,

No, ckb use cargo-nextest to run unit test:https://nexte.st/ Each test is executed in a separate process.

Running with --test-threads=1

Yes. Using --test-threads=1 is a valid workaround for cargo test, but running with only one thread is too slow. That’s exactly why I chose cargo-nextest for ckb.

I recommend running ckb’s unit tests using make test rather than cargo test.

❯ make -n test
cargo nextest run  --features deadlock_detection,with_sentry --workspace --no-fail-fast --hide-progress-bar --success-output immediate-final --failure-output immediate-final --run-ignored all

@chenyukang
Copy link
Collaborator

if so we can keep the new tests and don't keep the change for faketime?

@eval-exec
Copy link
Collaborator

if so we can keep the new tests and don't keep the change for faketime?

sure

@15168316096
Copy link
Contributor Author

. If multiple timestamp tests run in parallel,

No, ckb use cargo-nextest to run unit test:https://nexte.st/ Each test is executed in a separate process.

Running with --test-threads=1

Yes. Using --test-threads=1 is a valid workaround for cargo test, but running with only one thread is too slow. That’s exactly why I chose cargo-nextest for ckb.

I recommend running ckb’s unit tests using make test rather than cargo test.

❯ make -n test

cargo nextest run  --features deadlock_detection,with_sentry --workspace --no-fail-fast --hide-progress-bar --success-output immediate-final --failure-output immediate-final --run-ignored all

Developers supplement the single test. If you want to run a single case, you are more used to using cargo test. You will encounter this problem. So how to guide the developer to use make -n test to cover and solve this scenario? I understand that the purpose of the change is to run a single test. Cargo test can be more common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments