-
Notifications
You must be signed in to change notification settings - Fork 1
WC memops #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WC memops #174
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughCMake run target expands QEMU options (debug console, serial, parallel, network, USB, audio) and reorders explicit disk device/drives. New CMake option VF_CONFIG_MEMCPY_NT and corresponding compile definition added. Three write-combining memcpy variants (SSE2/AVX2/AVX512) and external wrappers introduced; FastMemcpy routed conditionally. arch/x86_64 adds VM-host conditional around OSXSAVE handling. Network: FastStrCopy used, Net_UnregisterDevice() added, and IpSend scans devices. Minor README and kernel message text edits. Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(1 hunks)arch/x86_64/features/x64.c(1 hunks)cmake/configuration.cmake(1 hunks)cmake/features.cmake(1 hunks)drivers/ethernet/Network.c(3 hunks)drivers/ethernet/interface/Ip.c(1 hunks)kernel/core/Kernel.c(1 hunks)mm/MemOps.c(2 hunks)mm/asm/memcpy.asm(8 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:58.963Z
Learning: In VoidFrame kernel's FastMemcpy implementation (kernel/memory/MemOps.c), memory corruption occurs under high load due to interrupts occurring between SSE2 movdqu load and store instructions, corrupting XMM register state. The fix requires disabling interrupts around SSE operations and adding memory barriers (sfence) to ensure stores complete atomically.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 101
File: kernel/etc/Shell.c:829-837
Timestamp: 2025-08-31T07:06:37.003Z
Learning: assembler-0 is working on optimizing ramfs performance in VoidFrame kernel and wants to implement proper filesystem characteristics like faster lookups, better memory management, and advanced features like copy-on-write and caching.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:27.395Z
Learning: In VoidFrame kernel, FastMemcpy was using SSE2 movdqu instructions without preserving XMM registers, causing crashes when called from interrupt context (like the scheduler's FastSchedule function). The fix is to use only general-purpose registers with 64-bit unrolled copies to maintain performance while being interrupt-safe.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 132
File: arch/x86_64/syscall/Syscall.c:12-19
Timestamp: 2025-09-19T11:55:43.328Z
Learning: In the VoidFrame kernel project, assembler-0 is implementing basic POSIX-compliant syscalls in ring 0 without user/kernel memory protection, prioritizing getting basic functionality working before adding full security features.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: docs/ARCHITECHTURE.md:1-3
Timestamp: 2025-08-31T14:19:11.366Z
Learning: assembler-0 wants to implement deep fault tracing for VoidFrame kernel exceptions (Page Fault, General Protection Fault, Invalid Opcode) instead of just panicking. The current interrupt handler has complete register state available in the Registers struct (defined in arch/x86_64/cpu/Cpu.h) including RBP for stack walking, RIP for fault location, and error codes for analysis.
Learnt from: assembler-0
Repo: assembler-0/autocc PR: 33
File: include/utils.hpp:138-190
Timestamp: 2025-08-04T13:24:48.984Z
Learning: The user assembler-0 tends to create wrapper functions during refactoring and sometimes accidentally leaves duplicate implementations. They have a casual, self-aware attitude about coding mistakes and are receptive to feedback about code duplication issues.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 80
File: README.md:7-8
Timestamp: 2025-08-13T08:43:54.750Z
Learning: User assembler-0 is working on VoidFrame kernel with a tight deadline (2-3 days before summer ends) and specifically requested runtime safety checks similar to __stack_chk_fail but for deadlocks and other kernel issues. The kernel already has extensive safety mechanisms including stack canaries, guard pages, heap integrity checks, and process security validation.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 72
File: arch/x86_64/syscall/SyscallEntry.asm:24-30
Timestamp: 2025-08-12T02:13:46.266Z
Learning: User assembler-0 has existing ELF infrastructure in VoidFrame kernel with Elf.h header definitions and CreateProcess() function that properly sets up process context including stack allocation, ProcessExitStub return address, and register initialization. The issue was missing ELF loader implementation, not compilation problems.
📚 Learning: 2025-08-14T02:11:27.395Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:27.395Z
Learning: In VoidFrame kernel, FastMemcpy was using SSE2 movdqu instructions without preserving XMM registers, causing crashes when called from interrupt context (like the scheduler's FastSchedule function). The fix is to use only general-purpose registers with 64-bit unrolled copies to maintain performance while being interrupt-safe.
Applied to files:
mm/asm/memcpy.asmarch/x86_64/features/x64.cmm/MemOps.c
📚 Learning: 2025-08-14T02:11:58.963Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:58.963Z
Learning: In VoidFrame kernel's FastMemcpy implementation (kernel/memory/MemOps.c), memory corruption occurs under high load due to interrupts occurring between SSE2 movdqu load and store instructions, corrupting XMM register state. The fix requires disabling interrupts around SSE operations and adding memory barriers (sfence) to ensure stores complete atomically.
Applied to files:
mm/asm/memcpy.asmarch/x86_64/features/x64.cmm/MemOps.c
📚 Learning: 2025-09-01T01:10:22.123Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: README.md:9-9
Timestamp: 2025-09-01T01:10:22.123Z
Learning: The current VoidFrame Registers struct in arch/x86_64/cpu/Cpu.h only saves basic GPRs, segment registers, and exception context (interrupt_number, error_code, rip, cs, rflags, rsp, ss). It's missing critical registers for deep fault tracing: SIMD registers (XMM/YMM/ZMM), control registers (CR0/CR2/CR3/CR4), debug registers (DR0-DR7), and extended FPU state. The struct has a "DO NOT TOUCH" comment indicating it must match interrupt ASM stack layout.
Applied to files:
mm/asm/memcpy.asmarch/x86_64/features/x64.c
📚 Learning: 2025-09-14T11:01:02.698Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 130
File: kernel/core/InitRD.c:25-35
Timestamp: 2025-09-14T11:01:02.698Z
Learning: In VoidFrame kernel's InitRD.c, multiboot modules should be mapped using VMemMap() with PAGE_PRESENT | PAGE_WRITABLE flags instead of VMemMapMMIO(), as the latter applies cache-disabling flags (PAGE_NOCACHE | PAGE_WRITETHROUGH) that can corrupt data when reading regular RAM content.
Applied to files:
kernel/core/Kernel.c
📚 Learning: 2025-09-01T01:10:22.123Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: README.md:9-9
Timestamp: 2025-09-01T01:10:22.123Z
Learning: VoidFrame's interrupt handlers in arch/x86_64/interrupts/Interrupts.asm only save basic GPRs and segment registers, missing critical registers for deep fault tracing: control registers (CR0/CR2/CR3/CR4), SIMD state (XMM/YMM/ZMM), debug registers (DR0-DR7), and extended FPU state. The CPU initialization code enables SSE/AVX/XSAVE features but interrupt handlers don't preserve this extended state. Enhanced fault tracing requires selective extended state saving for critical exceptions (ISR 6, 8, 13, 14) while keeping lightweight handling for regular interrupts.
Applied to files:
arch/x86_64/features/x64.c
📚 Learning: 2025-08-17T08:16:43.539Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 93
File: kernel/memory/VMem.c:72-79
Timestamp: 2025-08-17T08:16:43.539Z
Learning: In VoidFrame kernel, VMemGetPhysAddr and related page table walking functions were causing panics because they assumed identity mapping existed for physical addresses < IDENTITY_MAP_SIZE (4GB), but the bootloader doesn't provide this 1:1 mapping. The fix is to remove all identity mapping assumptions and always use PHYS_TO_VIRT() for page table access in functions like VMemGetPageTablePhys, VMemMap, VMemMapHuge, VMemGetPhysAddr, VMemUnmap, VMemMapMMIO, and VMemUnmapMMIO.
Applied to files:
arch/x86_64/features/x64.c
📚 Learning: 2025-08-31T14:19:11.366Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: docs/ARCHITECHTURE.md:1-3
Timestamp: 2025-08-31T14:19:11.366Z
Learning: assembler-0 wants to implement deep fault tracing for VoidFrame kernel exceptions (Page Fault, General Protection Fault, Invalid Opcode) instead of just panicking. The current interrupt handler has complete register state available in the Registers struct (defined in arch/x86_64/cpu/Cpu.h) including RBP for stack walking, RIP for fault location, and error codes for analysis.
Applied to files:
arch/x86_64/features/x64.c
📚 Learning: 2025-08-12T03:19:52.873Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 73
File: kernel/elf/ELFloader.c:0-0
Timestamp: 2025-08-12T03:19:52.873Z
Learning: VoidFrame kernel now has complete UEFI boot support with hybrid BIOS/UEFI ISO creation capability. The existing VBE framebuffer implementation requires no changes as it uses Multiboot2 framebuffer tags and direct memory-mapped access, making it inherently UEFI-compatible. Build process: ninja + ./scripts/build_hybrid_iso.sh creates dual-platform bootable ISO.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-08-12T03:19:52.873Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 73
File: kernel/elf/ELFloader.c:0-0
Timestamp: 2025-08-12T03:19:52.873Z
Learning: User assembler-0 has successfully implemented UEFI boot support for VoidFrame kernel. The VBE framebuffer implementation is fully UEFI-compatible using Multiboot2 framebuffer tags and direct memory-mapped access, requiring no kernel code changes - only bootloader configuration updates.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-08-17T05:53:48.040Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 92
File: meson.build:31-34
Timestamp: 2025-08-17T05:53:48.040Z
Learning: The VoidFrame kernel's memory detection was reporting incorrect memory amounts (like 9200MB) because it used the highest physical address (including memory holes and reserved regions) as total memory size, rather than summing only the MULTIBOOT2_MEMORY_AVAILABLE regions. The fix involves counting only available memory regions and using the highest available address only for bitmap sizing.
Applied to files:
CMakeLists.txt
🧬 Code graph analysis (4)
drivers/ethernet/interface/Ip.c (1)
drivers/ethernet/Network.c (1)
Net_GetDevice(48-53)
kernel/core/Kernel.c (1)
kernel/etc/Console.c (1)
PrintKernelSuccess(225-230)
arch/x86_64/features/x64.c (1)
kernel/etc/Console.c (1)
PrintKernelSuccess(225-230)
drivers/ethernet/Network.c (1)
kernel/etc/StringOps.c (1)
FastStrCopy(18-20)
🪛 Clang (14.0.6)
drivers/ethernet/interface/Ip.c
[warning] 27-27: variable 'net_dev' is not initialized
(cppcoreguidelines-init-variables)
[warning] 30-30: statement should be inside braces
(readability-braces-around-statements)
drivers/ethernet/Network.c
[warning] 32-32: 256 is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
drivers/ethernet/Network.c (1)
32-32: LGTM - Good use of FastStrCopy.Replacing manual string copy with
FastStrCopyimproves consistency with the codebase's string operation patterns.cmake/features.cmake (1)
100-102: LGTM - Consistent option propagation.The
VF_CONFIG_MEMCPY_NToption propagation follows the established pattern for other feature flags in this file.cmake/configuration.cmake (1)
39-39: LGTM - Conservative default for new optimization.The
VF_CONFIG_MEMCPY_NToption is properly declared with a descriptive string and safe default (OFF).CMakeLists.txt (1)
180-201: LGTM - Enhanced QEMU testing configuration.The verbose device configuration improves the testing environment by adding:
- Debug console and serial output for logging
- Network device for networking tests
- USB and audio devices for peripheral testing
- Well-organized disk configuration with clear comments
This makes the
runtarget more comprehensive for development and testing.
| #ifndef VF_CONFIG_VM_HOST | ||
| if (!cpu_features.osxsave) { | ||
| PrintKernelWarning("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n"); | ||
| PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n"); | ||
| cpu_features.avx = false; | ||
| cpu_features.avx2 = false; | ||
| cpu_features.avx512f = false; | ||
| CPUFeatureValidation(); | ||
| return; | ||
| } | ||
| PrintKernelSuccess("System: CPU: OSXSAVE supported.\n"); | ||
| #else | ||
| PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n"); | ||
| cpu_features.avx = false; | ||
| cpu_features.avx2 = false; | ||
| cpu_features.avx512f = false; | ||
| CPUFeatureValidation(); | ||
| return; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Reduce code duplication in AVX disabling logic.
Both branches of the VF_CONFIG_VM_HOST conditional contain identical AVX/AVX2/AVX512F disabling code and validation calls. Consider extracting this into a helper function or consolidating the logic.
Apply this diff to reduce duplication:
+ bool should_disable_avx = false;
#ifndef VF_CONFIG_VM_HOST
if (!cpu_features.osxsave) {
PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
- cpu_features.avx = false;
- cpu_features.avx2 = false;
- cpu_features.avx512f = false;
- CPUFeatureValidation();
- return;
+ should_disable_avx = true;
}
PrintKernelSuccess("System: CPU: OSXSAVE supported.\n");
#else
PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
+ should_disable_avx = true;
+#endif
+
+ if (should_disable_avx) {
+ cpu_features.avx = false;
+ cpu_features.avx2 = false;
+ cpu_features.avx512f = false;
+ CPUFeatureValidation();
+ return;
+ }
+
+#ifndef VF_CONFIG_VM_HOST
- cpu_features.avx = false;
- cpu_features.avx2 = false;
- cpu_features.avx512f = false;
- CPUFeatureValidation();
- return;
-#endifCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In arch/x86_64/features/x64.c around lines 65 to 82, the AVX/AVX2/AVX512F
disabling and CPUFeatureValidation calls are duplicated in both branches of the
VF_CONFIG_VM_HOST #ifdef; extract that repeated block into a single helper
function (e.g., DisableAVXFeatures or ConsolidateAVXFallback) or move the shared
logic after the #endif so it runs from one place, then replace both duplicated
blocks with a single call to the helper (ensure any necessary
includes/prototypes are added and preserve the existing
PrintKernel/PrintKernelSuccess messages semantics).
| NetworkDevice* net_dev = NULL; | ||
| for (int i = 0; i < MAX_NETWORK_DEVICES; i++) { | ||
| net_dev = Net_GetDevice(i); | ||
| if (net_dev) break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add NULL check after device selection.
The device selection loop leaves net_dev as NULL if no devices are found. Line 72 dereferences net_dev->get_mac_address() without checking, which will cause a kernel crash.
Apply this diff to add a NULL check:
NetworkDevice* net_dev = NULL;
for (int i = 0; i < MAX_NETWORK_DEVICES; i++) {
net_dev = Net_GetDevice(i);
- if (net_dev) break;
+ if (net_dev) {
+ break;
+ }
}
+
+ if (!net_dev) {
+ PrintKernel("IP: No network device available.\n");
+ return;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| NetworkDevice* net_dev = NULL; | |
| for (int i = 0; i < MAX_NETWORK_DEVICES; i++) { | |
| net_dev = Net_GetDevice(i); | |
| if (net_dev) break; | |
| } | |
| NetworkDevice* net_dev = NULL; | |
| for (int i = 0; i < MAX_NETWORK_DEVICES; i++) { | |
| net_dev = Net_GetDevice(i); | |
| if (net_dev) { | |
| break; | |
| } | |
| } | |
| if (!net_dev) { | |
| PrintKernel("IP: No network device available.\n"); | |
| return; | |
| } |
🧰 Tools
🪛 Clang (14.0.6)
[warning] 27-27: variable 'net_dev' is not initialized
(cppcoreguidelines-init-variables)
[warning] 30-30: statement should be inside braces
(readability-braces-around-statements)
🤖 Prompt for AI Agents
In drivers/ethernet/interface/Ip.c around lines 27 to 31, the loop that selects
a NetworkDevice can leave net_dev NULL; add a NULL check immediately after the
loop and before any dereference (e.g., before calling
net_dev->get_mac_address()), and handle the error path by returning an
appropriate error code or logging and aborting initialization to avoid a kernel
crash.
| void Net_UnregisterDevice() { | ||
|
|
||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Implement or track the empty stub.
Net_UnregisterDevice is currently an empty stub. If device unregistration is planned, consider implementing it or adding a TODO comment.
Do you want me to generate an implementation for device unregistration, or would you prefer to open an issue to track this?
🤖 Prompt for AI Agents
In drivers/ethernet/Network.c around lines 44–46, Net_UnregisterDevice is an
empty stub and must either be implemented or tracked; implement it to undo
everything Net_RegisterDevice does: check for NULL, acquire the same lock(s)
used by the device list, remove the device from global lists, cancel
timers/workqueues, disable/unregister netdev ops, free/unregister IRQs, drop
reference counts and free device memory, and log errors; ensure safe ordering
and concurrency (locks/RCU) and that calls are idempotent; if you don’t want to
implement now, add a clear TODO comment with a TODO/ISSUE reference (create an
issue) explaining what unregistration must do and why, plus a FIXME tag and an
explicit owner.
kernel/core/Kernel.c
Outdated
| PrintKernel("Info: Loading multiboot modules...\n"); | ||
| InitRDLoad(); | ||
| PrintKernelSuccess("System: Multiboot modules loaded\n"); | ||
| PrintKernelSuccess("System: Multiboot modules wed\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in success message.
The message text contains "wed" which should be "loaded".
Apply this diff to fix the typo:
- PrintKernelSuccess("System: Multiboot modules wed\n");
+ PrintKernelSuccess("System: Multiboot modules loaded\n");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| PrintKernelSuccess("System: Multiboot modules wed\n"); | |
| PrintKernelSuccess("System: Multiboot modules loaded\n"); |
🤖 Prompt for AI Agents
In kernel/core/Kernel.c around line 763, the success message contains a typo
("System: Multiboot modules wed"); update the string to the correct text
("System: Multiboot modules loaded") so the printed message accurately reports
that multiboot modules were loaded.
| .wc_copy_done: | ||
| jmp .wc_done | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reintroduce SFENCE in all WC memcpy epilogues
The new _wc paths never execute sfence before returning. WC regions are weakly ordered: without an sfence, the write-combining buffers can still hold data when the caller proceeds (e.g., ringing a device doorbell), leading to stale or partially copied payloads. This was the source of the earlier FastMemcpy corruption we fixed by fencing the streaming stores. Please add the fence back to each _wc epilogue.
.wc_copy_done:
+ sfence
jmp .wc_done
...
.avx2_wc_copy_done:
+ sfence
jmp .avx2_wc_done
...
.avx512_wc_copy_done:
+ sfence
jmp .avx512_wc_doneBased on learnings
Also applies to: 1137-1139, 1841-1843
🤖 Prompt for AI Agents
In mm/asm/memcpy.asm around lines 501-503 (and similarly at 1137-1139 and
1841-1843), the WC (_wc) memcpy epilogues jump straight to .wc_done without
issuing an sfence; reintroduce an sfence instruction immediately before the jmp
(or before returning from the _wc path) so write-combining buffers are drained
before the caller proceeds, ensuring ordered visibility of streaming stores;
apply the same change to the other two epilogues mentioned.
| #ifdef VF_CONFIG_MEMCPY_NT | ||
| if (features->avx512f) return memcpy_internal_avx512(d, s, size); | ||
| if (features->avx2) return memcpy_internal_avx2(d, s, size); | ||
| if (features->sse2) return memcpy_internal_sse2(d, s, size); | ||
| #else | ||
| if (features->avx512f) return memcpy_internal_avx512_wc(d, s, size); | ||
| if (features->avx2) return memcpy_internal_avx2_wc(d, s, size); | ||
| if (features->sse2) return memcpy_internal_sse2_wc(d, s, size); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Clarify VF_CONFIG_MEMCPY_NT flag semantics.
The conditional logic appears inverted: when VF_CONFIG_MEMCPY_NT (described as "Enable non-temporal memcpy optimizations") is defined, the code uses non-WC variants. When undefined, it uses WC (write-combining) variants, which typically employ non-temporal stores.
This suggests either:
- The flag name should be
VF_CONFIG_MEMCPY_WCorVF_CONFIG_DISABLE_MEMCPY_NT - The
#ifdeflogic is inverted
Please clarify the intended behavior.
Based on learnings, previous FastMemcpy implementations had interrupt-safety issues with SSE operations. Ensure the WC variants properly handle interrupt context and XMM register preservation.
🏁 Script executed:
#!/bin/bash
# Explore repository structure and locate key files
echo "=== Repository structure ==="
find . -type f -name "*.c" -o -name "*.h" -o -name "CMakeLists.txt" | grep -E "(MemOps|CMake)" | head -20Length of output: 100
🏁 Script executed:
#!/bin/bash
# Inspect mm/MemOps.c around lines 67-75 and full context
echo "=== mm/MemOps.c content around lines 67-75 ==="
cat mm/MemOps.c | head -100Length of output: 3825
🏁 Script executed:
#!/bin/bash
# Search for memcpy variant implementations
echo "=== Searching for memcpy_internal implementations ==="
rg -n "memcpy_internal_(avx512|avx2|sse2)" mm/MemOps.c -A 5 | head -80Length of output: 1443
🏁 Script executed:
#!/bin/bash
# Check for CMakeLists.txt or cmake config describing VF_CONFIG_MEMCPY_NT
echo "=== Searching for VF_CONFIG_MEMCPY_NT in cmake/config files ==="
find . -name "CMakeLists.txt" -o -name "*.cmake" | xargs rg -l "MEMCPY_NT" 2>/dev/null
rg -n "MEMCPY_NT" --type cmake 2>/dev/null | head -20Length of output: 341
🏁 Script executed:
#!/bin/bash
# Search for assembly implementations of memcpy_internal functions
echo "=== Looking for assembly/source files with memcpy implementations ==="
find . -type f \( -name "*.s" -o -name "*.asm" -o -name "*.S" \) | head -20
rg -l "memcpy_internal" --type-not cmake --type-not c 2>/dev/null | head -10Length of output: 504
🏁 Script executed:
#!/bin/bash
# Search within C files for actual memcpy_internal implementations (might be inline)
echo "=== Searching for memcpy_internal definitions (not just externs) ==="
rg -n "^void\*\s*memcpy_internal_" mm/
rg -n "memcpy_internal_sse2\s*\{" .
rg -n "memcpy_internal_avx2\s*\{" .Length of output: 68
🏁 Script executed:
#!/bin/bash
# Check if there are separate memcpy source files or directories
echo "=== Directory structure in mm/ and related paths ==="
ls -la mm/
find . -type d -name "*memcpy*" -o -type d -name "*copy*" 2>/dev/null
find . -type f -name "*memcpy*" 2>/dev/nullLength of output: 1287
🏁 Script executed:
#!/bin/bash
# Search for movntdq or other non-temporal instructions to understand _wc variants
echo "=== Searching for non-temporal instructions in codebase ==="
rg "movntdq|movnti|movntp|_wc" . --type c --type asm 2>/dev/null | head -30Length of output: 1856
Fix VF_CONFIG_MEMCPY_NT flag logic or rename it for clarity.
The conditional at lines 67-75 is inverted relative to the cmake description. The VF_CONFIG_MEMCPY_NT flag is described as "Enable non-temporal memcpy optimizations," but:
- When ON: uses non-_wc variants (regular temporal stores)
- When OFF: uses _wc variants with
movntdqinstructions (non-temporal stores)
Either:
- Swap the
#ifdefand#elsebranches, or - Rename the flag to
VF_CONFIG_MEMCPY_TEMPORALorVF_CONFIG_DISABLE_MEMCPY_NTto match the inverted logic
Summary by CodeRabbit
New Features
Chores
Bug Fixes