NAME
Disassemble::X86 - Disassemble Intel x86 binary code
SYNOPSIS
use Disassemble::X86;
$d = Disassemble::X86->new(text => $text_seg);
while (defined( $op = $d->disasm() )) {
printf "%04x %s\n", $d->op_start(), $op;
}
DESCRIPTION
This module disassembles binary-coded Intel x86 machine instructions into a human- and machine-readable format.
OUTPUT
Output is in Intel assembler syntax, with a few minor exceptions. Certain conventions are used in order to make it easier for programs to process the output of the disassembler. All output is in lower case. If opcode prefixes are present (other than segment register overrides and address/operand size overrides), they precede the opcode mnemonic separated by single space characters. If the instruction has any operands, they appear after another space, separated by commas.
There is no whitespace between or within operands, so you can separate the parts of an instruction with split ' '
. In order to make this possible, the word "PTR" is omitted from memory operands.
mov 0x42, WORD PTR [edx] becomes mov 0x42,word[edx]
The memory operand size (byte, word, etc.) is usually included in the operand, even if it can be determined from context. That way, the size is not lost if later processing separates the operand from the rest of the instruction. (Some memory operands have no real "size", though.)
ADD eax,[0x1234] becomes add eax,dword[0x1234]
Unlike AT&T assembler syntax, individual operands never contain embedded commas. This means that you can safely break up the operand list with split/,/
.
lea 0x0(,%ebx,4),%edi becomes lea edi,[ebx*4+0x0]
METHODS
new
$d = Disassemble::X86->new(
text => $text_seg,
start => $text_load_addr,
pos => $initial_eip,
addr_size => 32,
data_size => 32,
size => 32,
);
Creates a new disassembler object. There are a number of named parameters which can be given, all of which are optional.
- text
-
The so-called text segment, which consists of the binary data to be disassembled. It can be given either as a string or as a
Disassemble::X86::MemRegion
object. - start
-
The address at which the text segment would be loaded to execute the program. This parameter is ignored if
text
is a MemRegion object, and defaults to 0 otherwise. - pos
-
The address at which disassembly is to begin, unless changed by
$d->pos()
. Default value is the start of the text segment. - addr_size
-
Gives the address size (16 or 32 bit) which will be used when disassembling the code. Default is 32 bits. See below.
- data_size
-
Gives the data operand size, similar to
addr_size
. - size
-
Sets both
addr_size
anddata_size
.
disasm
$op = $d->disasm();
Disassembles a single machine instruction from the current position. Advances the current position to the next instruction. If no valid instruction is found at the current position, returns undef
and leaves the current position unchanged. In that case, you can check $d->error()
for more information.
addr_size
$d->addr_size(16);
Sets the address size for disassembled code. Valid values are 16, "word", 32, "dword", and "long", but some of these are synonyms. With no argument, returns the current address size as "word" or "dword".
data_size
$d->data_size("long");
Similar to addr_size above, but sets the data operand size.
pos
$d->pos($new_pos);
Sets the current disassembly position. With no argument, returns the current position.
text
$text = $d->text();
Returns the text segment as a Disassemble::X86::MemRegion
object.
at_end
until ( $d->at_end() ) {
...
}
Returns true if the current disassembly position has reached the end of the text segment.
contains
if ( $d->contains($addr) ) {
...
}
Returns true if $addr
is within the memory region being disassembled.
next_byte
$byte = $d->next_byte();
Returns the next byte from the current disassembly position as an integer value, and advances the current position by one. This can be used to skip over invalid instructions that are encountered during disassembly. If the current position is not valid, returns 0, but still advances the current position. Attempting to read beyond the 15-byte opcode size limit will cause an error.
op
This and the following functions return information about the previously disassembled machine instruction. $d->op()
returns the instruction itself, which is the same as the value returned by disasm
.
op_start
Returns the starting address of the instruction.
op_len
Returns the length of the instruction, in bytes.
op_proc
Returns the minimum processor model required. For instructions present in the original 8086 processor, the value 86 is returned. For instructions supported by the 8087 math coprocessor, the value is 87. Instructions initially introduced with the Pentium return 586, and so on. Note that setting the address or operand size to 32 bits requires at least a 386. Other possible return values are "mmx", "sse", "sse2", "3dnow", and "3dnow-e" (for extended 3DNow! instructions).
This information should be used carefully, because there may be subtle differences between different steppings of the same processor. In some cases, you must check the CPUID instruction to see exactly what your processor supports. When in doubt, consult the Intel Architecture Software Developer's Manual.
op_error
Returns the error message encountered while trying to disassemble an instruction.
LIMITATIONS
Multiple discontinuous text segments are not supported. Use additional Disassemble::X86
objects if you need them.
Some of the more exotic instructions like cache control and MMX extensions have not been thoroughly tested. Please let me know if you find something that is broken.
SEE ALSO
AUTHOR
Bob Mathews <bobmathews@alumni.calpoly.edu>
COPYRIGHT
Copyright (c) 2002 Bob Mathews. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.