Skip to content

Conversation

@LykxSassinator
Copy link
Contributor

@LykxSassinator LykxSassinator commented Jul 2, 2025

Background

PR #370 fixed a panic issue caused by concurrent updates to Memtable followed by reads on stale indexes. However, we observed that this fix introduced performance regressions under concurrent read/write workloads.

Issue

When fetch_entries_to retrieves a large batch of entries from disk, write operations are blocked until the read completes, delaying Memtable updates and degrading throughput.

Solution

This PR optimizes fetch_entries_to to reduce contention and improve performance under mixed workloads.

Results

Branch Status
Master image
This PR image

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 2, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 2, 2025
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator marked this pull request as ready for review July 2, 2025 11:29
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2025
for i in ents_idx.iter() {
vec.push(read_entry_from_file::<M, _>(self.pipe_log.as_ref(), i)?);
vec.push({
match read_entry_from_file::<M, _>(self.pipe_log.as_ref(), i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure the safety to read entry without holding the lock? Is it possible that some entries are truncated and target wal files are Gced(or reused) before this read, then the result of this read is undefined behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It's ensured.

Is it possible that some entries are truncated and target wal files are Gced(or reused) before this read

  • File Access Atomicity in Raft-Engine
    In raft-engine, file access (.raftlog) is atomic. The GC operation acquires a mutex and deletes the entire file before allowing other accesses. If a read operation successfully returns the target bytes, the result is guaranteed valid.

  • Index Consistency Handling
    This PR further ensures consistency by returning Error(e) if the Memtable index is updated (by either background rewrite or foreground write threads) during a read. Then it will automatically retry with the latest index to fetch the correct entry bytes.

Copy link
Contributor

@glorv glorv Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index Consistency Handling

Say if an entry is rewritten and the old wal file is purged before the real access, seems we still can read it with the old entry info as the file handle is changed then seems we still returns an error even if the entay is actually vaild.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it uses a stale file handle to get the entry, it will get the Error(OutOfSeq), ref:
https://github.com/LykxSassinator/raft-engine/blob/392f5e66f8286dc1b6d7cf69f2bc20ed72d40123/src/file_pipe_log/pipe.rs#L238

So, if the first fetch uses a stale index to get the entry, it will return an Error and trigger the second retry, where it will use the latest index to access the entry.

Copy link
Contributor

@glorv glorv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jul 3, 2025
@LykxSassinator LykxSassinator added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. labels Jul 4, 2025
Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator requested a review from overvenus July 4, 2025 07:07
@ti-chi-bot ti-chi-bot bot added the lgtm label Jul 4, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: glorv, overvenus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jul 4, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 4, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-07-03 07:36:16.133028097 +0000 UTC m=+1553228.856207081: ☑️ agreed by glorv.
  • 2025-07-04 08:22:18.919925679 +0000 UTC m=+1642391.643104660: ☑️ agreed by overvenus.

@ti-chi-bot ti-chi-bot bot merged commit 03f77d9 into tikv:master Jul 4, 2025
7 checks passed
LykxSassinator added a commit to LykxSassinator/raft-engine that referenced this pull request Jul 4, 2025
…ng. (tikv#382)

 

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
LykxSassinator added a commit to LykxSassinator/raft-engine that referenced this pull request Jul 4, 2025
…ng. (tikv#382)

 

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 4, 2025
…e` too long. (#382) (#384)

 

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 4, 2025
…e` too long. (#382) (#383)

 

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
LykxSassinator added a commit that referenced this pull request Jul 12, 2025
…ng. (#382)

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants