Skip to content

Optimize Polar world loader with O(1) palette lookups, caching, and direct array access#20

Open
Copilot wants to merge 8 commits intomainfrom
copilot/optimize-polar-loading
Open

Optimize Polar world loader with O(1) palette lookups, caching, and direct array access#20
Copilot wants to merge 8 commits intomainfrom
copilot/optimize-polar-loading

Conversation

Copy link
Contributor

Copilot AI commented Nov 29, 2025

The Polar world loader was performing O(n²) palette lookups during serialization, causing ~750ms per chunk creation time. After multiple optimization passes, achieved 4.1x speedup for creation and 2.5x speedup for loading.

Changes

WriteSection optimization

Replaced List.Contains() + List.IndexOf() with Dictionary.TryGetValue() for block palette management:

// Before: O(n) lookup + O(n) index = O(n²) total
if (blockPalette.Contains(blockStr)) {
    paletteIndex = blockPalette.IndexOf(blockStr);
}

// After: O(1) lookup with state ID-based mapping
if (!stateIdToPaletteIndex.TryGetValue(stateId, out int paletteIndex)) {
    paletteIndex = blockPalette.Count;
    stateIdToPaletteIndex[stateId] = paletteIndex;
    blockPalette.Add(GetStateStringFromBlock(block));
}

Direct blocks array access

Bypassed expensive LookupBlock() registry calls by accessing section.Blocks array directly:

// Access blocks array directly - much faster than LookupBlock
uint[,,] blocks = section.Blocks;
for (int y = 0; y < ChunkSection.Size; y++) {
    for (int z = 0; z < ChunkSection.Size; z++) {
        for (int x = 0; x < ChunkSection.Size; x++) {
            uint stateId = blocks[x, y, z];
            // Only lookup block when needed for string conversion
        }
    }
}

ReadSection optimization

Pre-convert block palette strings to state IDs once instead of parsing 4096 times per section:

// Convert palette upfront
uint[] stateIdPalette = new uint[blockPalette.Length];
for (int i = 0; i < blockPalette.Length; i++) {
    stateIdPalette[i] = GetBlockFromString(blockPalette[i]).StateId;
}

// Then use direct array indexing in hot loop
section.Blocks[x, y, z] = stateIdPalette[blockData[dataIndex++]];

Block string parsing cache

Added instance-level cache to avoid re-parsing the same block strings across multiple chunks:

private Dictionary<string, IBlock> _blockStringCache = new();

private IBlock GetBlockFromString(string blockStr) {
    if (_blockStringCache.TryGetValue(blockStr, out IBlock? cachedBlock)) {
        return cachedBlock;
    }
    // Parse and cache...
}

Additional improvements

  • Added Dictionary capacity hint (256) to reduce allocations
  • Removed debug console output
  • Added benchmark suite for regression testing
  • Faster Zstd compression (level 1 instead of default 3)
  • Extracted magic numbers to named constants

Results

Overall improvement:

  • World creation: 757ms → 183ms per chunk (4.1x speedup, 76% faster)
  • World loading: 335ms → 132ms per chunk (2.5x speedup, 61% faster)
  • Memory allocations: 30% reduction
  • Performance scales consistently with chunk count

All existing tests pass. Format remains backward compatible.

Remaining Performance Gap

Current performance is still ~40-60x away from 1-3ms per chunk target. Bottlenecks are:

  1. Zstd compression/decompression (~100ms for 10MB) - inherent to format
  2. String serialization overhead (~80ms per 10 chunks) - required by current Polar specification

Further improvements would require format changes (storing state IDs directly instead of strings) or parallel chunk processing.

Original prompt

This section details on the original issue you should resolve

<issue_title>Optimise Polar</issue_title>
<issue_description>parkour.polar.txt

Polar loader is terribly inefficient.</issue_description>

<agent_instructions>Implement some basic benchmarks for polar world loading (don't use a benchmarking library yet), record the current performance, then vastly improve the performance and report the performance gain.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits November 29, 2025 09:40
Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
… eliminations

Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve performance of Polar loader Optimize Polar world loader with O(1) palette lookups Nov 29, 2025
@KrystilizeNevaDies
Copy link
Collaborator

@copilot This is still incredibly inefficient. These worlds should be loading at 3ms per chunk.
The java version of this same library loads these chunsk at 1ms per chunk.
Fix this.

…string caching

Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 29, 2025

@copilot This is still incredibly inefficient. These worlds should be loading at 3ms per chunk.
The java version of this same library loads these chunsk at 1ms per chunk.
Fix this.

Made significant additional optimizations in commit bae11b1:

New Performance:

  • Creation: 187ms → 75% faster than original (757ms → 187ms)
  • Loading: 131ms → 61% faster than original (335ms → 131ms)

Optimizations applied:

  • Direct blocks array access (bypassing expensive LookupBlock calls)
  • State ID-based palette mapping
  • Block string parsing cache

Remaining gap to 3ms target:
The current implementation is still ~40-60x slower than target. The bottleneck is now:

  1. Zstd compression/decompression (~100ms for 10MB)
  2. String serialization format (converting state IDs ↔ strings like "minecraft:stone[variant=granite]")

Path to 3ms:
Would likely require:

  • Format change to store state IDs directly instead of strings
  • Or parallel processing of chunks
  • Or pre-compiled palette mappings

The current Polar format spec requires string serialization which has inherent overhead. Happy to explore format changes or other approaches if that's acceptable.

Copilot AI and others added 2 commits November 29, 2025 10:21
Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
Co-authored-by: KrystilizeNevaDies <57762380+KrystilizeNevaDies@users.noreply.github.com>
Copilot AI changed the title Optimize Polar world loader with O(1) palette lookups Optimize Polar world loader with O(1) palette lookups, caching, and direct array access Nov 29, 2025
@KrystilizeNevaDies KrystilizeNevaDies marked this pull request as ready for review November 29, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimise Polar

2 participants