Skip to content

Conversation

@assembler-0
Copy link
Owner

@assembler-0 assembler-0 commented Nov 1, 2025

Summary by CodeRabbit

  • New Features

    • Expanded emulation run configuration with richer network, USB, audio and disk/device I/O options
    • Improved network device discovery and registration behavior
  • Chores

    • Added an opt-in non-temporal memcpy performance option and corresponding optimized implementations
    • Updated build/configuration guidance and enhanced debug/output routing for runs
  • Bug Fixes

    • Fixed kernel startup message text typo

@coderabbitai
Copy link

coderabbitai bot commented Nov 1, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

CMake run target expands QEMU options (debug console, serial, parallel, network, USB, audio) and reorders explicit disk device/drives. New CMake option VF_CONFIG_MEMCPY_NT and corresponding compile definition added. Three write-combining memcpy variants (SSE2/AVX2/AVX512) and external wrappers introduced; FastMemcpy routed conditionally. arch/x86_64 adds VM-host conditional around OSXSAVE handling. Network: FastStrCopy used, Net_UnregisterDevice() added, and IpSend scans devices. Minor README and kernel message text edits.

Poem

🐇 I hopped through code with eager paws,
New memcpy hops in tuned, clever jaws,
QEMU gardens sprout ports, disks, and sound,
Networks register, devices found,
VM-host steady, features abound ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "WC memops" refers to a significant and real aspect of the changeset—specifically the addition of write-combining memory operations (memcpy variants with SSE2, AVX2, and AVX512 implementations) and the corresponding configuration logic in mm/MemOps.c and cmake. While the PR includes other changes (network driver modifications, CPU feature handling, QEMU configuration updates, and a kernel message fix), the WC memcpy work represents a primary focus with substantial code additions in mm/asm/memcpy.asm. However, the title uses the abbreviation "WC" and truncated term "memops," which may be unclear to teammates unfamiliar with the internal terminology and makes it somewhat cryptic for history scanning.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccb4e9c and c9ccaf5.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • kernel/core/Kernel.c (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@assembler-0 assembler-0 merged commit bb0f547 into main Nov 1, 2025
1 check passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80de4df and ccb4e9c.

📒 Files selected for processing (9)
  • CMakeLists.txt (1 hunks)
  • arch/x86_64/features/x64.c (1 hunks)
  • cmake/configuration.cmake (1 hunks)
  • cmake/features.cmake (1 hunks)
  • drivers/ethernet/Network.c (3 hunks)
  • drivers/ethernet/interface/Ip.c (1 hunks)
  • kernel/core/Kernel.c (1 hunks)
  • mm/MemOps.c (2 hunks)
  • mm/asm/memcpy.asm (8 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:58.963Z
Learning: In VoidFrame kernel's FastMemcpy implementation (kernel/memory/MemOps.c), memory corruption occurs under high load due to interrupts occurring between SSE2 movdqu load and store instructions, corrupting XMM register state. The fix requires disabling interrupts around SSE operations and adding memory barriers (sfence) to ensure stores complete atomically.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 101
File: kernel/etc/Shell.c:829-837
Timestamp: 2025-08-31T07:06:37.003Z
Learning: assembler-0 is working on optimizing ramfs performance in VoidFrame kernel and wants to implement proper filesystem characteristics like faster lookups, better memory management, and advanced features like copy-on-write and caching.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:27.395Z
Learning: In VoidFrame kernel, FastMemcpy was using SSE2 movdqu instructions without preserving XMM registers, causing crashes when called from interrupt context (like the scheduler's FastSchedule function). The fix is to use only general-purpose registers with 64-bit unrolled copies to maintain performance while being interrupt-safe.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 132
File: arch/x86_64/syscall/Syscall.c:12-19
Timestamp: 2025-09-19T11:55:43.328Z
Learning: In the VoidFrame kernel project, assembler-0 is implementing basic POSIX-compliant syscalls in ring 0 without user/kernel memory protection, prioritizing getting basic functionality working before adding full security features.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: docs/ARCHITECHTURE.md:1-3
Timestamp: 2025-08-31T14:19:11.366Z
Learning: assembler-0 wants to implement deep fault tracing for VoidFrame kernel exceptions (Page Fault, General Protection Fault, Invalid Opcode) instead of just panicking. The current interrupt handler has complete register state available in the Registers struct (defined in arch/x86_64/cpu/Cpu.h) including RBP for stack walking, RIP for fault location, and error codes for analysis.
Learnt from: assembler-0
Repo: assembler-0/autocc PR: 33
File: include/utils.hpp:138-190
Timestamp: 2025-08-04T13:24:48.984Z
Learning: The user assembler-0 tends to create wrapper functions during refactoring and sometimes accidentally leaves duplicate implementations. They have a casual, self-aware attitude about coding mistakes and are receptive to feedback about code duplication issues.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 80
File: README.md:7-8
Timestamp: 2025-08-13T08:43:54.750Z
Learning: User assembler-0 is working on VoidFrame kernel with a tight deadline (2-3 days before summer ends) and specifically requested runtime safety checks similar to __stack_chk_fail but for deadlocks and other kernel issues. The kernel already has extensive safety mechanisms including stack canaries, guard pages, heap integrity checks, and process security validation.
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 72
File: arch/x86_64/syscall/SyscallEntry.asm:24-30
Timestamp: 2025-08-12T02:13:46.266Z
Learning: User assembler-0 has existing ELF infrastructure in VoidFrame kernel with Elf.h header definitions and CreateProcess() function that properly sets up process context including stack allocation, ProcessExitStub return address, and register initialization. The issue was missing ELF loader implementation, not compilation problems.
📚 Learning: 2025-08-14T02:11:27.395Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:27.395Z
Learning: In VoidFrame kernel, FastMemcpy was using SSE2 movdqu instructions without preserving XMM registers, causing crashes when called from interrupt context (like the scheduler's FastSchedule function). The fix is to use only general-purpose registers with 64-bit unrolled copies to maintain performance while being interrupt-safe.

Applied to files:

  • mm/asm/memcpy.asm
  • arch/x86_64/features/x64.c
  • mm/MemOps.c
📚 Learning: 2025-08-14T02:11:58.963Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 82
File: kernel/process/Process.h:0-0
Timestamp: 2025-08-14T02:11:58.963Z
Learning: In VoidFrame kernel's FastMemcpy implementation (kernel/memory/MemOps.c), memory corruption occurs under high load due to interrupts occurring between SSE2 movdqu load and store instructions, corrupting XMM register state. The fix requires disabling interrupts around SSE operations and adding memory barriers (sfence) to ensure stores complete atomically.

Applied to files:

  • mm/asm/memcpy.asm
  • arch/x86_64/features/x64.c
  • mm/MemOps.c
📚 Learning: 2025-09-01T01:10:22.123Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: README.md:9-9
Timestamp: 2025-09-01T01:10:22.123Z
Learning: The current VoidFrame Registers struct in arch/x86_64/cpu/Cpu.h only saves basic GPRs, segment registers, and exception context (interrupt_number, error_code, rip, cs, rflags, rsp, ss). It's missing critical registers for deep fault tracing: SIMD registers (XMM/YMM/ZMM), control registers (CR0/CR2/CR3/CR4), debug registers (DR0-DR7), and extended FPU state. The struct has a "DO NOT TOUCH" comment indicating it must match interrupt ASM stack layout.

Applied to files:

  • mm/asm/memcpy.asm
  • arch/x86_64/features/x64.c
📚 Learning: 2025-09-14T11:01:02.698Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 130
File: kernel/core/InitRD.c:25-35
Timestamp: 2025-09-14T11:01:02.698Z
Learning: In VoidFrame kernel's InitRD.c, multiboot modules should be mapped using VMemMap() with PAGE_PRESENT | PAGE_WRITABLE flags instead of VMemMapMMIO(), as the latter applies cache-disabling flags (PAGE_NOCACHE | PAGE_WRITETHROUGH) that can corrupt data when reading regular RAM content.

Applied to files:

  • kernel/core/Kernel.c
📚 Learning: 2025-09-01T01:10:22.123Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: README.md:9-9
Timestamp: 2025-09-01T01:10:22.123Z
Learning: VoidFrame's interrupt handlers in arch/x86_64/interrupts/Interrupts.asm only save basic GPRs and segment registers, missing critical registers for deep fault tracing: control registers (CR0/CR2/CR3/CR4), SIMD state (XMM/YMM/ZMM), debug registers (DR0-DR7), and extended FPU state. The CPU initialization code enables SSE/AVX/XSAVE features but interrupt handlers don't preserve this extended state. Enhanced fault tracing requires selective extended state saving for critical exceptions (ISR 6, 8, 13, 14) while keeping lightweight handling for regular interrupts.

Applied to files:

  • arch/x86_64/features/x64.c
📚 Learning: 2025-08-17T08:16:43.539Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 93
File: kernel/memory/VMem.c:72-79
Timestamp: 2025-08-17T08:16:43.539Z
Learning: In VoidFrame kernel, VMemGetPhysAddr and related page table walking functions were causing panics because they assumed identity mapping existed for physical addresses < IDENTITY_MAP_SIZE (4GB), but the bootloader doesn't provide this 1:1 mapping. The fix is to remove all identity mapping assumptions and always use PHYS_TO_VIRT() for page table access in functions like VMemGetPageTablePhys, VMemMap, VMemMapHuge, VMemGetPhysAddr, VMemUnmap, VMemMapMMIO, and VMemUnmapMMIO.

Applied to files:

  • arch/x86_64/features/x64.c
📚 Learning: 2025-08-31T14:19:11.366Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 103
File: docs/ARCHITECHTURE.md:1-3
Timestamp: 2025-08-31T14:19:11.366Z
Learning: assembler-0 wants to implement deep fault tracing for VoidFrame kernel exceptions (Page Fault, General Protection Fault, Invalid Opcode) instead of just panicking. The current interrupt handler has complete register state available in the Registers struct (defined in arch/x86_64/cpu/Cpu.h) including RBP for stack walking, RIP for fault location, and error codes for analysis.

Applied to files:

  • arch/x86_64/features/x64.c
📚 Learning: 2025-08-12T03:19:52.873Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 73
File: kernel/elf/ELFloader.c:0-0
Timestamp: 2025-08-12T03:19:52.873Z
Learning: VoidFrame kernel now has complete UEFI boot support with hybrid BIOS/UEFI ISO creation capability. The existing VBE framebuffer implementation requires no changes as it uses Multiboot2 framebuffer tags and direct memory-mapped access, making it inherently UEFI-compatible. Build process: ninja + ./scripts/build_hybrid_iso.sh creates dual-platform bootable ISO.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-08-12T03:19:52.873Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 73
File: kernel/elf/ELFloader.c:0-0
Timestamp: 2025-08-12T03:19:52.873Z
Learning: User assembler-0 has successfully implemented UEFI boot support for VoidFrame kernel. The VBE framebuffer implementation is fully UEFI-compatible using Multiboot2 framebuffer tags and direct memory-mapped access, requiring no kernel code changes - only bootloader configuration updates.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-08-17T05:53:48.040Z
Learnt from: assembler-0
Repo: assembler-0/VoidFrame PR: 92
File: meson.build:31-34
Timestamp: 2025-08-17T05:53:48.040Z
Learning: The VoidFrame kernel's memory detection was reporting incorrect memory amounts (like 9200MB) because it used the highest physical address (including memory holes and reserved regions) as total memory size, rather than summing only the MULTIBOOT2_MEMORY_AVAILABLE regions. The fix involves counting only available memory regions and using the highest available address only for bitmap sizing.

Applied to files:

  • CMakeLists.txt
🧬 Code graph analysis (4)
drivers/ethernet/interface/Ip.c (1)
drivers/ethernet/Network.c (1)
  • Net_GetDevice (48-53)
kernel/core/Kernel.c (1)
kernel/etc/Console.c (1)
  • PrintKernelSuccess (225-230)
arch/x86_64/features/x64.c (1)
kernel/etc/Console.c (1)
  • PrintKernelSuccess (225-230)
drivers/ethernet/Network.c (1)
kernel/etc/StringOps.c (1)
  • FastStrCopy (18-20)
🪛 Clang (14.0.6)
drivers/ethernet/interface/Ip.c

[warning] 27-27: variable 'net_dev' is not initialized

(cppcoreguidelines-init-variables)


[warning] 30-30: statement should be inside braces

(readability-braces-around-statements)

drivers/ethernet/Network.c

[warning] 32-32: 256 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (4)
drivers/ethernet/Network.c (1)

32-32: LGTM - Good use of FastStrCopy.

Replacing manual string copy with FastStrCopy improves consistency with the codebase's string operation patterns.

cmake/features.cmake (1)

100-102: LGTM - Consistent option propagation.

The VF_CONFIG_MEMCPY_NT option propagation follows the established pattern for other feature flags in this file.

cmake/configuration.cmake (1)

39-39: LGTM - Conservative default for new optimization.

The VF_CONFIG_MEMCPY_NT option is properly declared with a descriptive string and safe default (OFF).

CMakeLists.txt (1)

180-201: LGTM - Enhanced QEMU testing configuration.

The verbose device configuration improves the testing environment by adding:

  • Debug console and serial output for logging
  • Network device for networking tests
  • USB and audio devices for peripheral testing
  • Well-organized disk configuration with clear comments

This makes the run target more comprehensive for development and testing.

Comment on lines +65 to +82
#ifndef VF_CONFIG_VM_HOST
if (!cpu_features.osxsave) {
PrintKernelWarning("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
cpu_features.avx = false;
cpu_features.avx2 = false;
cpu_features.avx512f = false;
CPUFeatureValidation();
return;
}
PrintKernelSuccess("System: CPU: OSXSAVE supported.\n");
#else
PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
cpu_features.avx = false;
cpu_features.avx2 = false;
cpu_features.avx512f = false;
CPUFeatureValidation();
return;
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Reduce code duplication in AVX disabling logic.

Both branches of the VF_CONFIG_VM_HOST conditional contain identical AVX/AVX2/AVX512F disabling code and validation calls. Consider extracting this into a helper function or consolidating the logic.

Apply this diff to reduce duplication:

+    bool should_disable_avx = false;
 #ifndef VF_CONFIG_VM_HOST
     if (!cpu_features.osxsave) {
         PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
-        cpu_features.avx = false;
-        cpu_features.avx2 = false;
-        cpu_features.avx512f = false;
-        CPUFeatureValidation();
-        return;
+        should_disable_avx = true;
     }
     PrintKernelSuccess("System: CPU: OSXSAVE supported.\n");
 #else
     PrintKernel("System: CPU: OSXSAVE not supported. AVX/2/512F will be disabled.\n");
+    should_disable_avx = true;
+#endif
+
+    if (should_disable_avx) {
+        cpu_features.avx = false;
+        cpu_features.avx2 = false;
+        cpu_features.avx512f = false;
+        CPUFeatureValidation();
+        return;
+    }
+
+#ifndef VF_CONFIG_VM_HOST
-    cpu_features.avx = false;
-    cpu_features.avx2 = false;
-    cpu_features.avx512f = false;
-    CPUFeatureValidation();
-    return;
-#endif

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In arch/x86_64/features/x64.c around lines 65 to 82, the AVX/AVX2/AVX512F
disabling and CPUFeatureValidation calls are duplicated in both branches of the
VF_CONFIG_VM_HOST #ifdef; extract that repeated block into a single helper
function (e.g., DisableAVXFeatures or ConsolidateAVXFallback) or move the shared
logic after the #endif so it runs from one place, then replace both duplicated
blocks with a single call to the helper (ensure any necessary
includes/prototypes are added and preserve the existing
PrintKernel/PrintKernelSuccess messages semantics).

Comment on lines +27 to 31
NetworkDevice* net_dev = NULL;
for (int i = 0; i < MAX_NETWORK_DEVICES; i++) {
net_dev = Net_GetDevice(i);
if (net_dev) break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add NULL check after device selection.

The device selection loop leaves net_dev as NULL if no devices are found. Line 72 dereferences net_dev->get_mac_address() without checking, which will cause a kernel crash.

Apply this diff to add a NULL check:

     NetworkDevice* net_dev = NULL;
     for (int i = 0; i < MAX_NETWORK_DEVICES; i++) {
         net_dev = Net_GetDevice(i);
-        if (net_dev) break;
+        if (net_dev) {
+            break;
+        }
     }
+
+    if (!net_dev) {
+        PrintKernel("IP: No network device available.\n");
+        return;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
NetworkDevice* net_dev = NULL;
for (int i = 0; i < MAX_NETWORK_DEVICES; i++) {
net_dev = Net_GetDevice(i);
if (net_dev) break;
}
NetworkDevice* net_dev = NULL;
for (int i = 0; i < MAX_NETWORK_DEVICES; i++) {
net_dev = Net_GetDevice(i);
if (net_dev) {
break;
}
}
if (!net_dev) {
PrintKernel("IP: No network device available.\n");
return;
}
🧰 Tools
🪛 Clang (14.0.6)

[warning] 27-27: variable 'net_dev' is not initialized

(cppcoreguidelines-init-variables)


[warning] 30-30: statement should be inside braces

(readability-braces-around-statements)

🤖 Prompt for AI Agents
In drivers/ethernet/interface/Ip.c around lines 27 to 31, the loop that selects
a NetworkDevice can leave net_dev NULL; add a NULL check immediately after the
loop and before any dereference (e.g., before calling
net_dev->get_mac_address()), and handle the error path by returning an
appropriate error code or logging and aborting initialization to avoid a kernel
crash.

Comment on lines +44 to +46
void Net_UnregisterDevice() {

}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Implement or track the empty stub.

Net_UnregisterDevice is currently an empty stub. If device unregistration is planned, consider implementing it or adding a TODO comment.

Do you want me to generate an implementation for device unregistration, or would you prefer to open an issue to track this?

🤖 Prompt for AI Agents
In drivers/ethernet/Network.c around lines 44–46, Net_UnregisterDevice is an
empty stub and must either be implemented or tracked; implement it to undo
everything Net_RegisterDevice does: check for NULL, acquire the same lock(s)
used by the device list, remove the device from global lists, cancel
timers/workqueues, disable/unregister netdev ops, free/unregister IRQs, drop
reference counts and free device memory, and log errors; ensure safe ordering
and concurrency (locks/RCU) and that calls are idempotent; if you don’t want to
implement now, add a clear TODO comment with a TODO/ISSUE reference (create an
issue) explaining what unregistration must do and why, plus a FIXME tag and an
explicit owner.

PrintKernel("Info: Loading multiboot modules...\n");
InitRDLoad();
PrintKernelSuccess("System: Multiboot modules loaded\n");
PrintKernelSuccess("System: Multiboot modules wed\n");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in success message.

The message text contains "wed" which should be "loaded".

Apply this diff to fix the typo:

-    PrintKernelSuccess("System: Multiboot modules wed\n");
+    PrintKernelSuccess("System: Multiboot modules loaded\n");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PrintKernelSuccess("System: Multiboot modules wed\n");
PrintKernelSuccess("System: Multiboot modules loaded\n");
🤖 Prompt for AI Agents
In kernel/core/Kernel.c around line 763, the success message contains a typo
("System: Multiboot modules wed"); update the string to the correct text
("System: Multiboot modules loaded") so the printed message accurately reports
that multiboot modules were loaded.

Comment on lines +501 to +503
.wc_copy_done:
jmp .wc_done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reintroduce SFENCE in all WC memcpy epilogues

The new _wc paths never execute sfence before returning. WC regions are weakly ordered: without an sfence, the write-combining buffers can still hold data when the caller proceeds (e.g., ringing a device doorbell), leading to stale or partially copied payloads. This was the source of the earlier FastMemcpy corruption we fixed by fencing the streaming stores. Please add the fence back to each _wc epilogue.

 .wc_copy_done:
+    sfence
     jmp .wc_done
 ...
 .avx2_wc_copy_done:
+    sfence
     jmp .avx2_wc_done
 ...
 .avx512_wc_copy_done:
+    sfence
     jmp .avx512_wc_done

Based on learnings

Also applies to: 1137-1139, 1841-1843

🤖 Prompt for AI Agents
In mm/asm/memcpy.asm around lines 501-503 (and similarly at 1137-1139 and
1841-1843), the WC (_wc) memcpy epilogues jump straight to .wc_done without
issuing an sfence; reintroduce an sfence instruction immediately before the jmp
(or before returning from the _wc path) so write-combining buffers are drained
before the caller proceeds, ensuring ordered visibility of streaming stores;
apply the same change to the other two epilogues mentioned.

Comment on lines +67 to +75
#ifdef VF_CONFIG_MEMCPY_NT
if (features->avx512f) return memcpy_internal_avx512(d, s, size);
if (features->avx2) return memcpy_internal_avx2(d, s, size);
if (features->sse2) return memcpy_internal_sse2(d, s, size);
#else
if (features->avx512f) return memcpy_internal_avx512_wc(d, s, size);
if (features->avx2) return memcpy_internal_avx2_wc(d, s, size);
if (features->sse2) return memcpy_internal_sse2_wc(d, s, size);
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify VF_CONFIG_MEMCPY_NT flag semantics.

The conditional logic appears inverted: when VF_CONFIG_MEMCPY_NT (described as "Enable non-temporal memcpy optimizations") is defined, the code uses non-WC variants. When undefined, it uses WC (write-combining) variants, which typically employ non-temporal stores.

This suggests either:

  1. The flag name should be VF_CONFIG_MEMCPY_WC or VF_CONFIG_DISABLE_MEMCPY_NT
  2. The #ifdef logic is inverted

Please clarify the intended behavior.

Based on learnings, previous FastMemcpy implementations had interrupt-safety issues with SSE operations. Ensure the WC variants properly handle interrupt context and XMM register preservation.


🏁 Script executed:

#!/bin/bash
# Explore repository structure and locate key files
echo "=== Repository structure ==="
find . -type f -name "*.c" -o -name "*.h" -o -name "CMakeLists.txt" | grep -E "(MemOps|CMake)" | head -20

Length of output: 100


🏁 Script executed:

#!/bin/bash
# Inspect mm/MemOps.c around lines 67-75 and full context
echo "=== mm/MemOps.c content around lines 67-75 ==="
cat mm/MemOps.c | head -100

Length of output: 3825


🏁 Script executed:

#!/bin/bash
# Search for memcpy variant implementations
echo "=== Searching for memcpy_internal implementations ==="
rg -n "memcpy_internal_(avx512|avx2|sse2)" mm/MemOps.c -A 5 | head -80

Length of output: 1443


🏁 Script executed:

#!/bin/bash
# Check for CMakeLists.txt or cmake config describing VF_CONFIG_MEMCPY_NT
echo "=== Searching for VF_CONFIG_MEMCPY_NT in cmake/config files ==="
find . -name "CMakeLists.txt" -o -name "*.cmake" | xargs rg -l "MEMCPY_NT" 2>/dev/null
rg -n "MEMCPY_NT" --type cmake 2>/dev/null | head -20

Length of output: 341


🏁 Script executed:

#!/bin/bash
# Search for assembly implementations of memcpy_internal functions
echo "=== Looking for assembly/source files with memcpy implementations ==="
find . -type f \( -name "*.s" -o -name "*.asm" -o -name "*.S" \) | head -20
rg -l "memcpy_internal" --type-not cmake --type-not c 2>/dev/null | head -10

Length of output: 504


🏁 Script executed:

#!/bin/bash
# Search within C files for actual memcpy_internal implementations (might be inline)
echo "=== Searching for memcpy_internal definitions (not just externs) ==="
rg -n "^void\*\s*memcpy_internal_" mm/
rg -n "memcpy_internal_sse2\s*\{" .
rg -n "memcpy_internal_avx2\s*\{" .

Length of output: 68


🏁 Script executed:

#!/bin/bash
# Check if there are separate memcpy source files or directories
echo "=== Directory structure in mm/ and related paths ==="
ls -la mm/
find . -type d -name "*memcpy*" -o -type d -name "*copy*" 2>/dev/null
find . -type f -name "*memcpy*" 2>/dev/null

Length of output: 1287


🏁 Script executed:

#!/bin/bash
# Search for movntdq or other non-temporal instructions to understand _wc variants
echo "=== Searching for non-temporal instructions in codebase ==="
rg "movntdq|movnti|movntp|_wc" . --type c --type asm 2>/dev/null | head -30

Length of output: 1856


Fix VF_CONFIG_MEMCPY_NT flag logic or rename it for clarity.

The conditional at lines 67-75 is inverted relative to the cmake description. The VF_CONFIG_MEMCPY_NT flag is described as "Enable non-temporal memcpy optimizations," but:

  • When ON: uses non-_wc variants (regular temporal stores)
  • When OFF: uses _wc variants with movntdq instructions (non-temporal stores)

Either:

  1. Swap the #ifdef and #else branches, or
  2. Rename the flag to VF_CONFIG_MEMCPY_TEMPORAL or VF_CONFIG_DISABLE_MEMCPY_NT to match the inverted logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants