NAME

CPU::Z80::Disassembler - Disassemble the flow of a Z80 program

SYNOPSIS

use CPU::Z80::Disassembler;
$dis = CPU::Z80::Disassembler->new;
$dis->memory->load_file($file_name, $addr, $opt_skip_bytes, $opt_length);
$dis->write_dump; $dis->write_dump($file);
$dis->analyse;
$dis->write_asm;  $dis->write_asm($file);

$dis->get_type($addr);
$dis->set_type_code($addr [,$count]);
$dis->set_type_byte($addr [,$count]);
$dis->set_type_word($addr [,$count]);

$dis->set_call($addr, 1);    # this may be called
$dis->set_call($addr, $sub); # @next_code = $sub->($self, $next_addr) will be called

$dis->code($addr [, $label]);
$dis->defb($addr [, $count][, $label]);
$dis->defw($addr [, $count][, $label]);
$dis->defm($addr, $size [, $label]);
$dis->defmz($addr [, $count][, $label]);
$dis->defm7($addr [, $count][, $label]);

$dis->block_comment($addr, $block_comment);
$dis->line_comments($addr, @line_comments);

$dis->relative_arg($addr, $label_name);
$dis->ix_base($addr);
$dis->iy_base($addr);

$dis->create_control_file($ctl_file, $bin_file, $addr, $arch);
$dis->load_control_file($ctl_file);

DESCRIPTION

Implements a Z80 disassembler. Loads a binary file into memory and dumps an unprocessed disassembly listing (see write_dump).

Alternatively there are functions to tell the disassembler where there are data bytes and what are code entry points and labels. The disassembler will follow the code by simulating a Z80 processor, to find out where the code region finishes.

As a call instruction may be followed by data, the disassembler tries to find out if the called routine manipulates the return stack. If it does not, and ends with a ret, then the routine is considered safe, and the disassembly continues after the call instruction. If the routine is not considered safe, a message is written at the end of the disassembled file asking the used to check the routines manually; the set_call method should then be used to tell how to handle calls to that routine on the next iteration.

The analyse function can be called just before dumping the output to try to find higher level constructs in the assembly listing. For example, it transforms the sequence ld b,h:ld c,l into ld bc,hl.

The write_asm dumps an assembly listing that can be re-assembled to obtain the starting binary file. All the unknown region bytes are disassembled as defb instructions, and a map is shown at the end of the file with the code regions (C), byte regions (B), word regions (W) and unknown regions (-).

FUNCTIONS

new

Creates the object.

memory

CPU::Z80::Disassembler::Memory object containing the memory being analysed.

instr

Reference to an array that contains all the disassembled instructions as CPU::Z80::Disassembler::Intruction objects, indexed by the address of the instruction. The entry is undef if there is no disassembled instruction at that address (either not known, or pointing to the second, etc, bytes of a multi-byte instruction).

labels

Returns the CPU::Z80::Disassembler::Labels object that contains all the defined labels.

Attributes containing blocks of text to dump before and after the assembly listing. They are used by write_asm.

ix_base, iy_base

Base addess for (IX+DIS) and (IY+DIS) instructions, if constant in all the code. Causes the disassembly to dump:

IY0 equ 0xHHHH                ; 0xHHHH is iy_base
    ...
    ld  a,(iy+0xHHHH-IY0)     ; 0xHHHH is the absolute address

write_dump

Outputs a disassembly dump on the given file, or standard output if no file provided.

The disassembly dump shows the address and bytes of each instruction with the disassembled instruction.

analyse

Analyse the disassembled information looking for higher level constructs. For example, it replaces 'ld c,(hl):inc hl' by 'ldi c,(hl)'.

Should be called immediately before write_asm.

write_asm

Outputs a disassembly listing on the given file, or standard output if no file provided.

The disassembly listing can be assembled to obtain the original binary file.

set_type_code, set_type_byte, set_type_word

Sets the type of the given address. An optional count allows the definitions of the type of consecutive memory locations.

It is an error to set a type of a not-defined memory location, or to redefine a type.

get_type

Gets the type at the given address, one of TYPE_UNKNOWN, TYPE_CODE, TYPE_BYTE or TYPE_WORD constants.

It is an error to set a type of a not-defined memory location.

set_call

Declares a subroutine at the given address, either with no stack impact (if 1 is passed as argument) or with a stack impact to be computed by the given code reference. This function is called with $self and the address after the call instruction as arguments and should return the next address(es) where the code stream shall continue.

code

Declares the given address and all following instructions up to an unconditional jump as a block of code, with an optional label.

defb, defb2, defw, defm, defmz, defm7

Declares the given address as a def* instruction with an optional label.

block_comment

Creates a block comment to insert before the given address.

line_comments

Appends each of the given line comments to the instrutions starting at the given address, one comment per instruction.

relative_arg

Shows the instruction argument (NN or N) relative to a given label name. Label name can be '$' for a value relative to the instruction pointer.

create_control_file

$dis->create_control_file($ctl_file, $bin_file, $addr, $arch);

Creates a new control file for the given input binary file, starting at the given address and for the given architecture.

The address defaults to zero, and the architecture to undefined. The architecture may be implemented in the future, for example to define system variable equates for the given architecture.

It is an error to overwrite a control file.

The Control File is the input file for a disassembly run in an interactive disassembly session, and the outout is the <bin_file>.asm. After each run, the user studies the output .asm file, and includes new commands in the control file to add information to the .asm file on the next run.

This function creates a template control file that contains just the hex dump of the binary file and the decoded assembly instruction at each address, e.g.

0000                         :F <bin_file>
0000 D3FD       out ($FD),a
0002 01FF7F     ld bc,$7FFF
0005 C3CB03     jp $03CB

The control file commands start with a ':' and refer to the hexadecimal address at the start of the line.

Some commands operate on a range of addresses and accept the inclusive range limits separated by a single '-'.

A line starting with a blank uses the same address as the previous command.

A semicolon starts a comment in the control file.

0000      :;        define next address as 0x0000
          :<cmd>  ; <cmd> at the same address 0x0000
0000-001F :B      ; define a range address of bytes

The dump between the address and the ':' is ignored and is helpfull as a guide while adding information to the control file.

load_control_file

$dis->load_control_file($ctl_file);

Load the control file created by <create_control_file> and subsequently edited by the user and create a new .asm disassembly file.

Control File commands

Include

Include another control file at the current location.

#include vars.ctl

File

Load a binary file at the given address.

0000 :F zx81.rom

Code

Define the start of a code routine, with an optional label. The code is not known to be stack-safe, i.e. not to have data bytes following the call instruction. The disassembler stops disassembly when it cannot determine if the bytes after a call instruction are data or code.

0000 :C START

Procedure

Define the start of a procedure with a possible list of arguments following the call instruction.

The signature is a list of {'B','W','C'}+, identifing each of the following items after the call instruction (Byte, Word or Code). In the following example the call istruction is followed by one byte and one word, and the procedure returns to the address after the word.

0000 P proc B,W,C

The signature defaults to a single 'C', meaning the procedure returns to the point after call.

A signature without a 'C' means that the call never returns.

Bytes and Words

Define data bytes and words in the given address range.

0000-0003 :B label
0000-0003 :B label
0000-0003 :B2[1] label	; one byte per line, binary data
0000-0003 :W label

Define a symbol

Define the name of a symbol.

4000 := ERR_NO  comment\nline 2 of comment

IX and IY base

Define base address for IX and IY indexed mode.

4000 :IX
4000 :IY

Header block

Define a text block to be output before the given address. The block is inserted vervbatin, so include ';' if a comment is intended.

0000 :# ; header
     :# ; continuation
     :# abc EQU 23

Line comment

Define a line comment to show at the given address.

0000 :; comment

Define a text block to be output at the top and the bottom of the assembly file. The block is inserted vervbatin, so include ';' if a comment is intended.

0000 :< ; header
     :< ; continuation
     :> ; footer

ACKNOWLEDGEMENTS

AUTHOR

Paulo Custodio, <pscust at cpan.org>

BUGS and FEEDBACK

Please report any bugs or feature requests through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=CPU-Z80-Disassembler.

LICENSE AND COPYRIGHT

Copyright 2010 Paulo Custodio.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

The Spectrum 48K ROM used in the test scripts is Copyright by Amstrad. Amstrad have kindly given their permission for the redistribution of their copyrighted material but retain that copyright (see http://www.worldofspectrum.org/permits/amstrad-roms.txt).