Skip to content

GPU-driven rendering: merge GPU buffers to eliminate per-VItem CPU submission overhead #139

@AzurIce

Description

@AzurIce

Background

The current rendering pipeline maintains separate GPU buffers and bind groups for each VItem. Every frame requires per-VItem calls to write_buffer, set_bind_group, and draw.

Benchmarks from benches/benches/gpu_render.rs on the main branch show that CPU submission is the dominant bottleneck, not the GPU:

VItem count GPU total (submit+wait) CPU submit (no GPU wait) GPU pure (diff)
25 5.6 ms 1.6 ms ~4.0 ms
100 8.9 ms 4.1 ms ~4.8 ms
400 22.3 ms 25.2 ms — (CPU already exceeds GPU)
1600 88.8 ms 92.7 ms
3600 256 ms 220 ms ~36 ms

CPU cost scales linearly at ~55μs/VItem. At 1600 VItems, CPU accounts for ~89% of total frame time. Root cause: 3600 VItems × 7 write_buffer calls = 25,200 API calls, each with fixed overhead (lock, validation, staging allocation).

Approach

Adopt a GPU-driven rendering strategy — merge all VItem data into a single set of contiguous GPU buffers:

  1. Every-frame rebuild: each frame, pack all VItem points, fill_rgbas, stroke_rgbas, and stroke_widths into large contiguous buffers, with an ItemInfo index table recording each item's offset and count
  2. Instanced drawing: use draw(0..4, 0..N) to render all N VItems in a single draw call; the vertex shader uses instance_index to look up per-item clip box and plane data
  3. Binary search in compute shader: each compute thread binary-searches item_infos to determine which item it belongs to, then performs 3D→2D projection and atomic clip box updates

This reduces per-VItem O(N) draw calls / bind group switches / write_buffer calls to O(1).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions