AMD 29k Processor Plugin for IDA Pro
The tests subdirectory provides test executables compiled with a vintage version of gcc. The microprose3d subdirectory provides a script to extract the roms of the DrMath unit from Microprose 3D arcate cabinets from MAME roms.
The processor has delayed branches, which cause the instruction after the delayed branch to be executed in a delay slot, before the actual branch happens.
We have two modes of handling this behaviour. One is to rewrite the control flow and explicitly reoder the control-flow graph such that the instructions are shown in the real execution order. The other (recommended) is to use the IDA Pro handling of delay slots, and to not rewrite the control flow, but to mark the end of basic blocks, so that IDA can delay the branch until the end of a block is reached.
The plugin supports autogenerated comments, using the instruction descriptions from AMD, filled with the registers information from the decoded instructions.
Switch idioms are handled by matching predefined templates on the partial control flow graph.
The idiom matcher uses patterns of instructions and operand placeholders.
Instructions are separated using ';' (separation only, no ';' for last instruction). Opcodes and operands are separated by ' ' (mandatory, see 'nop ;', for example). Operands are separated by ','. Operands can be '_' (blank) or a string consiting of anything but a ',', ' ', or ';'. Strings form either def-use pairs of operands that need to match or can be used to catch operands to be analyzed if the idiom matches. Blank operands can be anything, but may not redefine tracked def-use pairs.
The matching recursively walks backwards from a jump with unknown targets. It tracks the definitions and uses for the operands defined in the pattern and stops if a redefinition occurs from an instruction that is not part of the pattern.
For more detailed information see:
- Tobias Conradi. Matching of Control- and Data-Flow Constructs in Disassembled Code. Bachelor thesis, TU Hamburg-Harburg, September 2015. https://github.com/toco/IdiomMatcher/
For idioms with an unknown number of entries in the table, the code tries to guess the length using a simple heurisic. The heuristic specifies that entries in the jump table belong to the switch table, if they are within a fixed numerical distance of the switch itself.
This is a simplified version of a heuristic described in:
- Arne Wichmann. Binary Analysis for Code Reconstruction of Control Software. Diplomarbeit, TU Hamburg-Harburg, October 2012.
The classic AMD 29K, as well as gcc store executables as COFF files. Edianness of the files is platform specific. So far, there is support for files with the magic 0x017A. The loader traverses and loads all sections and loads them as segments. Debug symbols are traversed and the names are included into the IDA database.
Instruction definitions are from the "AM29000 User's Manual" and "AM29050 User's Manual" by Advanced Micro Devices. Instruction descriptions are from the "29K Family. 1990 Data Book" by Advanced Micro Devices.