Skip to content

Comments

(feat): Bitmask-Aware Untracked Tracking for @with_pool#16

Merged
mgyoo86 merged 7 commits intomasterfrom
feat/bit_mask
Feb 17, 2026
Merged

(feat): Bitmask-Aware Untracked Tracking for @with_pool#16
mgyoo86 merged 7 commits intomasterfrom
feat/bit_mask

Conversation

@mgyoo86
Copy link
Collaborator

@mgyoo86 mgyoo86 commented Feb 17, 2026

Summary

Replace the boolean _untracked_flags system with fine-grained per-type bitmask tracking, enabling @with_pool to keep the fast typed checkpoint/rewind path even when untracked acquire! calls occur in helper functions — as long as those types are already covered by the macro's tracked set.

Previously, any untracked acquire! call forced a full checkpoint/rewind over all 8 fixed-slot types. Now, each untracked call records which type it touched via a UInt16 bitmask, and the macro performs a subset check at runtime: if untracked ⊆ tracked, the typed (fast) path is preserved.

Key metrics:

  • 6x faster for the most common false-positive scenario (untracked same type)
  • 6.5x faster in hot loops with untracked helpers
  • Zero allocations preserved across all scenarios

Motivation

The @with_pool macro analyzes its AST to extract which types are used, enabling typed (fast) checkpoint/rewind that only saves/restores those specific type pools (~77% faster than full). However, acquire! calls inside helper functions are invisible to the macro and were marked as "untracked" with a single boolean flag per depth level.

This caused false-positive full rewinds in common patterns:

# Helper function — invisible to @with_pool
@noinline function compute_helper!(pool)
    v = acquire!(pool, Float64, 100)  # ← sets _untracked_flags[depth] = true
    sum(v)
end

@with_pool pool begin
    A = acquire!(pool, Float64, 1000)  # ← macro tracks Float64
    result = compute_helper!(pool)      # ← triggers full rewind (was 213 ns)
    A .+ result
end
# The helper acquires Float64 (same type the macro already tracks),
# but the boolean flag can't distinguish this from an unknown type.

Design

Bitmask Subset Check

Each of the 8 fixed-slot types maps to a bit in a UInt16 via _fixed_slot_bit(T):

Bit Type Bit Type
0 Float64 4 ComplexF64
1 Float32 5 ComplexF32
2 Int64 6 Bool
3 Int32 7 Bit (BitArray)

Non-fixed-slot types (e.g. UInt8) set a separate _untracked_has_others flag, which always forces the full path. UInt16 supports up to 16 fixed-slot types (8 currently used, 8 reserved); if more are needed, widening to UInt32/UInt64 requires only a type alias change — all bitwise operations remain identical.

The decision function performs a single-instruction subset check:

@inline function _can_use_typed_path(pool, tracked_mask::UInt16)
    depth = pool._current_depth
    untracked_mask = @inbounds pool._untracked_fixed_masks[depth]
    has_others = @inbounds pool._untracked_has_others[depth]
    return (untracked_mask & ~tracked_mask) == UInt16(0) && !has_others
end

How the Typed Path Decision Works

The macro statically extracts types from acquire! calls it can see. At runtime, the bitmask tracks what happened outside its visibility:

Situation Checkpoint Rewind Why
No untracked acquires typed typed mask=0 → trivially subset
Untracked types ⊆ tracked types typed typed e.g. helper acquires Float64, macro also tracks Float64
Untracked types ⊄ tracked types typed* full e.g. helper acquires Float32, macro only tracks Float64
Non-fixed-slot type untracked typed* full has_others=true → always forces full path
Macro extracted no types full full use_typed=false at compile time, no bitmask check

*Checkpoint runs before the helper, so untracked mask is still empty → typed. Rewind runs after, sees the actual mask → falls back to full if needed. This asymmetry is safe because _rewind_typed_pool! uses depth-based orphan cleanup to restore types that were not checkpointed.

Benchmark Results (Before → After)

Hardware: Apple Silicon, Julia 1.12.5, AdaptiveArrayPools v0.1.2

False-Positive Scenarios (optimization targets)

Scenario Before After Speedup
Same type [FP] (Scenario 2 vs 1) 213.8 ns 35.5 ns 6.0x
Multi-type covered [FP] (Scenario 5 vs 1) 272.0 ns 75.7 ns 3.6x
Nested same type [FP] (Scenario 7 vs 8) 934.5 ns 586.1 ns 1.6x
Hot loop N=1000 [FP] (Scenario 10 vs 9) 204.0 μs 31.5 μs 6.5x

True-Positive Scenarios (should be unchanged)

Scenario Before After Delta
Different fixed type [TP] 255.1 ns 256.3 ns ~same
Non-fixed type (others) [TP] 262.8 ns 264.7 ns ~same
Partial coverage [TP] 257.0 ns 265.0 ns ~same

Raw Checkpoint/Rewind Cost (unchanged)

Operation Before After
Typed (1 type) 7.6 ns 8.5 ns
Typed (2 types) 17.8 ns 18.7 ns
Full (all 8 types) 545.5 ns 539.2 ns

Allocations: zero across all scenarios (before and after).

Representative Test Scenario

@testset "Scenario A: typed rewind preserved when untracked ⊆ tracked" begin
    pool = get_task_local_pool()
    empty!(pool)

    function float64_helper!(p)
        acquire!(p, Float64, 50)  # Untracked — sets Float64 bit
    end

    @with_pool pool begin
        float64_helper!(pool)              # Untracked Float64
        v = acquire!(pool, Float64, 100)   # Tracked Float64
        v[1] = 42.0
        v[1]
    end

    # Bitmask check: Float64 ⊆ {Float64} → typed rewind used → correct state
    @test pool._current_depth == 1
    empty!(pool)
end

Phase 1 of typed-aware untracked tracking: add _untracked_fixed_masks
(Vector{UInt16}) and _untracked_has_others (Vector{Bool}) fields to
AdaptiveArrayPool. These parallel arrays follow the same 1-based sentinel
pattern as _untracked_flags. All lifecycle operations (checkpoint, rewind,
reset, empty) updated to push/pop/restore the new vectors.

No behavior change — existing _untracked_flags logic is untouched. The new
fields are populated with sentinel values but not yet read by any decision
logic. Prepares the data structure for Phase 2 (typed _mark_untracked!).
Phase 2 of typed-aware untracked tracking: replaces untyped
_mark_untracked!(pool) with typed _mark_untracked!(pool, ::Type{T})
across all 36 call sites (8 acquire.jl, 28 convenience.jl).

- Add _fixed_slot_bit dispatch mapping each fixed-slot type to UInt16 bit
- Rewrite _mark_untracked! to set per-type bitmask or has_others flag
- Bridge: legacy _untracked_flags still set for dual-track transition
- Add 9 test sets covering dispatch, marking, and public API propagation
Replace _untracked_flags boolean conditionals with _can_use_typed_path
bitmask subset check in @with_pool macro-generated code. Adds
_tracked_mask_for_types (@generated compile-time constant) and
_can_use_typed_path (@inline runtime check) to state.jl. Simplifies
5 generator functions by centralizing typed/full path decision into
_generate_typed_checkpoint_call and _generate_typed_rewind_call helpers,
removing 10 inline conditional blocks from macros.jl.
… tracking

Phase 4 of typed-aware untracked tracking: remove the boolean
_untracked_flags::Vector{Bool} field from AdaptiveArrayPool and
CuAdaptiveArrayPool, now fully replaced by the fine-grained
_untracked_fixed_masks::Vector{UInt16} + _untracked_has_others::Vector{Bool}
bitmask system introduced in Phases 1-3.

Removes all push!/pop!/empty! calls for _untracked_flags across
checkpoint!, rewind!, reset!, and empty! in both CPU and CUDA paths.
The CUDA extension was missing _untracked_fixed_masks and _untracked_has_others
fields that were added to AdaptiveArrayPool during the bitmask untracked tracking
feature (Phases 1-4). Without these fields, any acquire!() call inside a CUDA
@with_pool scope would throw a FieldError via _mark_untracked!(), and
_can_use_typed_path() would also fail.

Changes:
- Add _untracked_fixed_masks::Vector{UInt16} and _untracked_has_others::Vector{Bool}
  fields to CuAdaptiveArrayPool struct with sentinel initialization
- checkpoint!(full/typed-1/typed-N): push bitmask state on depth increment
- rewind!(full/typed-1/typed-N): pop bitmask state on depth decrement
- reset! and empty!: restore bitmask sentinel state ([UInt16(0)], [false])
- Multi-type checkpoint!/rewind! now deduplicate types at compile time (matching
  CPU behavior from src/state.jl)
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 98.11321% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.08%. Comparing base (b9f17c0) to head (9922b17).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/macros.jl 90.90% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #16      +/-   ##
==========================================
+ Coverage   96.76%   97.08%   +0.31%     
==========================================
  Files           9        9              
  Lines        1176     1200      +24     
==========================================
+ Hits         1138     1165      +27     
+ Misses         38       35       -3     
Files with missing lines Coverage Δ
src/acquire.jl 100.00% <100.00%> (ø)
src/convenience.jl 100.00% <100.00%> (ø)
src/state.jl 100.00% <100.00%> (ø)
src/types.jl 100.00% <100.00%> (ø)
src/macros.jl 92.70% <90.90%> (+0.28%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the boolean _untracked_flags system with fine-grained per-type bitmask tracking, enabling @with_pool to preserve the fast typed checkpoint/rewind path even when untracked acquire! calls occur in helper functions, as long as those types are covered by the macro's tracked set.

Changes:

  • Replaced single boolean flag with UInt16 bitmask for tracking which fixed-slot types had untracked acquires
  • Added _untracked_has_others flag for non-fixed-slot types
  • Implemented bitmask subset check _can_use_typed_path to decide between typed and full checkpoint/rewind paths
  • Updated all checkpoint/rewind/reset/empty! functions to maintain bitmask state
  • Modified macro code generation to emit conditional branches based on bitmask subset checks

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/types.jl Added _fixed_slot_bit function mapping and new bitmask fields to AdaptiveArrayPool struct
src/state.jl Updated checkpoint/rewind/reset/empty! to maintain bitmask state; added _tracked_mask_for_types and _can_use_typed_path helpers
src/macros.jl Modified checkpoint/rewind call generation to use bitmask subset checks
src/acquire.jl Updated _mark_untracked! to set type-specific bitmask bits instead of boolean flag
src/convenience.jl Updated all convenience functions to pass type parameter to _mark_untracked!
ext/AdaptiveArrayPoolsCUDAExt/types.jl Added bitmask fields to CuAdaptiveArrayPool struct
ext/AdaptiveArrayPoolsCUDAExt/state.jl Updated CUDA pool state management with bitmask tracking; added duplicate type handling in @generated functions
test/test_state.jl Added comprehensive tests for bitmask metadata lifecycle, type marking, subset checks, and end-to-end scenarios
test/test_macro_expansion.jl Added tests verifying macro expansion uses new bitmask functions
docs/src/architecture/macro-internals.md Updated documentation to explain bitmask-based tracking system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mgyoo86 mgyoo86 merged commit a2fb750 into master Feb 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant