Skip to content

Conversation

@freeqaz
Copy link

@freeqaz freeqaz commented Feb 3, 2026

Summary

  • Adds tail block detection to XEX function analysis: when pdata reports a function end but disassembly reveals out-of-line code after it (small blocks <=64 bytes with backward branches + blr), these are merged back into the preceding function rather than treated as separate symbols
  • Relaxes the strict assert_eq on function end addresses to allow detected ends beyond pdata-reported ends (since tail blocks extend past pdata boundaries)
  • Skips merging when the candidate function has a global-scope symbol (from user symbols.txt, PDB, or map file), preserving intentionally defined functions that happen to look like tail blocks
  • Without the global-scope guard, write_symbols() would drop user-defined symbols on every re-split because merged tail blocks get NoWrite flags

Example: Curl_resolv_timeout

The MSVC compiler placed Curl_resolv_timeout (7 instructions, 28 bytes) immediately after Curl_resolv with a tail call (b, not bl) back into it. Without this change, dtk merges the two and Curl_resolv_timeout disappears from symbols.txt on every split, preventing independent comparison in objdiff.

With this change and scope:global in symbols.txt:

  • Curl_resolv: 94.5% → 99.9% (remaining 2 diffs are relocation encoding, unfixable)
  • Curl_resolv_timeout: 100% match

Test plan

  • Build with cargo build --release — compiles clean
  • Re-split a project with known tail blocks — verify merged tail blocks still get merged
  • Add a global-scope symbol at a tail block address — verify it survives re-split
  • Run full report and diff against previous — no regressions

@freeqaz freeqaz force-pushed the fix/preserve-global-tail-blocks branch from ad084e7 to a02abfb Compare February 3, 2026 07:03
Add tail block detection to XEX function analysis. When pdata reports
a function end but disassembly reveals out-of-line code after it
(small blocks with backward branches + blr), these are merged back
into the preceding function rather than treated as separate symbols.

Additionally, skip merging when the candidate has a global-scope
symbol (from user symbols.txt, PDB, or map file), since these
represent intentionally defined functions that should not be absorbed.
This prevents symbols.txt regeneration from dropping user-defined
functions that happen to look like tail blocks.
- Extract MAX_TAIL_BLOCK_BYTES constant and helper functions
  (is_unconditional_blr, branch_into_range)
- Split check_tail_block into three methods: dispatcher,
  check_tail_block_backward_branch (case 1), and
  check_tail_block_scan_block (case 2)
- Optimize merge_tail_blocks to collect only (addr, end) tuples
  instead of cloning full FunctionInfo with slices
- Replace unwrap() with expect() for better panic messages
- Add 18 unit tests in separate cfa_tests.rs covering: helper
  functions, check_tail_block cases, merge_tail_blocks
  merging/skipping, apply() symbol deletion and size extension,
  global-scope preservation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant