NAME

Disassemble::X86 - Disassemble Intel x86 binary code

SYNOPSIS

use Disassemble::X86;
$d = Disassemble::X86->new(text => $text_seg);
while (defined( $op = $d->disasm() )) {
  printf "%04x  %s\n", $d->op_start(), $op;
}

DESCRIPTION

This module disassembles binary-coded Intel x86 machine instructions into a human- and machine-readable format.

OUTPUT

Output is in Intel assembler syntax, with a few minor exceptions. Certain conventions are used in order to make it easier for programs to process the output of the disassembler. All output is in lower case. If opcode prefixes are present (other than segment register overrides and address/operand size overrides), they precede the opcode mnemonic separated by single space characters. If the instruction has any operands, they appear after another space, separated by commas.

There is no whitespace between or within operands, so you can separate the parts of an instruction with split ' '. In order to make this possible, the word "PTR" is omitted from memory operands.

mov 0x42, WORD PTR [edx]    becomes    mov 0x42,word[edx]

The memory operand size (byte, word, etc.) is usually included in the operand, even if it can be determined from context. That way, the size is not lost if later processing separates the operand from the rest of the instruction. (Some memory operands have no real "size", though.)

ADD eax,[0x1234]    becomes    add eax,dword[0x1234]

Unlike AT&T assembler syntax, individual operands never contain embedded commas. This means that you can safely break up the operand list with split/,/.

lea 0x0(,%ebx,4),%edi    becomes    lea edi,[ebx*4+0x0]

METHODS

new

$d = Disassemble::X86->new(
    text      => $text_seg,
    start     => $text_load_addr,
    pos       => $initial_eip,
    addr_size => 32,
    data_size => 32,
    size      => 32,
);

Creates a new disassembler object. There are a number of named parameters which can be given, all of which are optional.

text

The so-called text segment, which consists of the binary data to be disassembled. It can be given either as a string or as a Disassemble::X86::MemRegion object.

start

The address at which the text segment would be loaded to execute the program. This parameter is ignored if text is a MemRegion object, and defaults to 0 otherwise.

pos

The address at which disassembly is to begin, unless changed by $d->pos(). Default value is the start of the text segment.

addr_size

Gives the address size (16 or 32 bit) which will be used when disassembling the code. Default is 32 bits. See below.

data_size

Gives the data operand size, similar to addr_size.

size

Sets both addr_size and data_size.

disasm

$op = $d->disasm();

Disassembles a single machine instruction from the current position. Advances the current position to the next instruction. If no valid instruction is found at the current position, returns undef and leaves the current position unchanged. In that case, you can check $d->error() for more information.

addr_size

$d->addr_size(16);

Sets the address size for disassembled code. Valid values are 16, "word", 32, "dword", and "long", but some of these are synonyms. With no argument, returns the current address size as "word" or "dword".

data_size

$d->data_size("long");

Similar to addr_size above, but sets the data operand size.

pos

$d->pos($new_pos);

Sets the current disassembly position. With no argument, returns the current position.

text

$text = $d->text();

Returns the text segment as a Disassemble::X86::MemRegion object.

at_end

until ( $d->at_end() ) {
  ...
}

Returns true if the current disassembly position has reached the end of the text segment.

contains

if ( $d->contains($addr) ) {
  ...
}

Returns true if $addr is within the memory region being disassembled.

next_byte

$byte = $d->next_byte();

Returns the next byte from the current disassembly position as an integer value, and advances the current position by one. This can be used to skip over invalid instructions that are encountered during disassembly. If the current position is not valid, returns 0, but still advances the current position. Attempting to read beyond the 15-byte opcode size limit will cause an error.

op

This and the following functions return information about the previously disassembled machine instruction. $d->op() returns the instruction itself, which is the same as the value returned by disasm.

op_start

Returns the starting address of the instruction.

op_len

Returns the length of the instruction, in bytes.

op_proc

Returns the minimum processor model required. For instructions present in the original 8086 processor, the value 86 is returned. For instructions supported by the 8087 math coprocessor, the value is 87. Instructions initially introduced with the Pentium return 586, and so on. Note that setting the address or operand size to 32 bits requires at least a 386. Other possible return values are "mmx", "sse", "sse2", "3dnow", and "3dnow-e" (for extended 3DNow! instructions).

This information should be used carefully, because there may be subtle differences between different steppings of the same processor. In some cases, you must check the CPUID instruction to see exactly what your processor supports. When in doubt, consult the Intel Architecture Software Developer's Manual.

op_error

Returns the error message encountered while trying to disassemble an instruction.

LIMITATIONS

Multiple discontinuous text segments are not supported. Use additional Disassemble::X86 objects if you need them.

Some of the more exotic instructions like cache control and MMX extensions have not been thoroughly tested. Please let me know if you find something that is broken.

SEE ALSO

Disassemble::X86::MemRegion

AUTHOR

Bob Mathews <bobmathews@alumni.calpoly.edu>

COPYRIGHT

Copyright (c) 2002 Bob Mathews. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.