A program using prefix-tree model to predict basic block boundaries of x86 binary.
The process of translating high-level languages into low-level machine codes through a compiler can result in the loss of some high-level information, such as function structures. This can make it challenging when analyzing binaries. In this study, we employ a Weight Prefix Tree to analyze X86 binaries with the aim of predicting the starting and ending positions of High-Level Language (HLL) functions within the binaries to assist in optimizing binaries.
- OS:
- Ubuntu 14.04 32-bit
- Compiler
- gcc 4.8.4 tar=i686-linux-gnu
Compiled C source codes into training data using GCC optimization levels o0 to o2. The correct function boundaries are extracted from header files as the target for training.
Two Weight Prefix Tree structures are constructed
- one for predicting the beginning of functions
- the other one was built in reverse order for predicting the end of functions
- Move
megaca_o0.cfg,megaca_o1.cfg,megaca_o2.cfg, andbuildSPEC.shto/SPEC_CPU2006/config. - Modify the path inside
buildSPEC.shas follows:DIR="/SPEC_CPU2006/benchspec/CPU2006/" DESDIR="/workspace/"
- Run the script
cd /SPEC_CPU2006/config bash buildSPEC.sh
g++ pfx_tree.cpp -o pfx_tree
./pfx_tree