feat: hoist invariant CI checks before CRC sweep for early rejection #34
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This optimization splits case-insensitive (CI) character constraints into two categories:
ci_const(checks on bytes that don't depend on the swept hash0 byte or CRC) andci_var(checks that touch bytes 2, 34, or 35). By movingci_constchecks outside and before the hash0 sweep loop, candidates can be rejected early without entering the expensive 256-iteration CRC sweep at all. For end-pattern matching, the last ~6 characters span bytes 31-35, meaning roughly half the CI checks fall on bytes 31-33 (invariant across the sweep) and can filter out ~99.99% of candidates before the loop even begins. Start patterns don't benefit because their CI characters overlap with byte 2 (the swept byte), making all their checksci_var. A secondary optimization replaces full 34-byte CRC recomputation with a precomputed delta table (crc_base ^ delta[hash0]), reducing in-loop CRC cost from 34 operations to 1 XOR, but the early-exit filtering is the primary driver of the 8-18× speedup observed for end case-insensitive patterns.