-
Notifications
You must be signed in to change notification settings - Fork 32
Closed
Description
Background
The current rendering pipeline maintains separate GPU buffers and bind groups for each VItem. Every frame requires per-VItem calls to write_buffer, set_bind_group, and draw.
Benchmarks from benches/benches/gpu_render.rs on the main branch show that CPU submission is the dominant bottleneck, not the GPU:
| VItem count | GPU total (submit+wait) | CPU submit (no GPU wait) | GPU pure (diff) |
|---|---|---|---|
| 25 | 5.6 ms | 1.6 ms | ~4.0 ms |
| 100 | 8.9 ms | 4.1 ms | ~4.8 ms |
| 400 | 22.3 ms | 25.2 ms | — (CPU already exceeds GPU) |
| 1600 | 88.8 ms | 92.7 ms | — |
| 3600 | 256 ms | 220 ms | ~36 ms |
CPU cost scales linearly at ~55μs/VItem. At 1600 VItems, CPU accounts for ~89% of total frame time. Root cause: 3600 VItems × 7 write_buffer calls = 25,200 API calls, each with fixed overhead (lock, validation, staging allocation).
Approach
Adopt a GPU-driven rendering strategy — merge all VItem data into a single set of contiguous GPU buffers:
- Every-frame rebuild: each frame, pack all VItem points, fill_rgbas, stroke_rgbas, and stroke_widths into large contiguous buffers, with an
ItemInfoindex table recording each item's offset and count - Instanced drawing: use
draw(0..4, 0..N)to render all N VItems in a single draw call; the vertex shader usesinstance_indexto look up per-item clip box and plane data - Binary search in compute shader: each compute thread binary-searches
item_infosto determine which item it belongs to, then performs 3D→2D projection and atomic clip box updates
This reduces per-VItem O(N) draw calls / bind group switches / write_buffer calls to O(1).
Related
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels