NAME

PHP::Decode::Parser

SYNOPSIS

  # Create an instance

  sub warn_msg {
	my ($action, $fmt) = (shift, shift);
	my $msg = sprintf $fmt, @_;
	print 'WARN: ', $action, ': ', $msg, "\n";
  }
  my %strmap;
  my $parser = PHP::Decode::Parser->new(strmap => \%strmap, filename => 'test', warn => \&warn_msg);

  # Parse php token list

  my $line = '<?php echo "test"; ?>';
  my $quote = $parser->tokenize_line($line);
  my $tokens = $parser->tokens();
  my $stmt = $parser->read_code($tokens);

  # Expand to code again

  my $code = $parser->format_stmt($stmt, {format => 1});
  print $code;

DESCRIPTION

The PHP::Decode::Parser Module tokenizes and parses php code strings. The parser does not depend on a special php version. It supports most php syntax of interpreters from php5 to php8.

The parser assumes that the input file is a valid php script, and does not enforce strict syntactic checks. Unrecognized tokens are simply passed through.

The parser converts the php code into a unified form when the resulting output is formatted (for example intermittent php-script-tags and variables from string interpolation are removed and rewritten to php echo statements).

METHODS

new

$parser = PHP::Decode::Parser->new(%args);

Create a PHP::Decode::Parser object. Arguments are passed in key => value pairs.

The only required argument is `strmap`.

The new constructor dies when arguments are invalid, or if required arguments are missing.

The accepted arguments are:

strmap: hashmap for parsed php statements
inscript: set to indicate that paser starts inside of script
filename: optional script filename (if not stdin or textstr)
max_strlen: max strlen for debug strings
warn: optional handler to log warning messages
log: optional handler to log info messages
debug: optional handler to log debug messages

tokenize_line

Tokenize a php code string

quote = $parser->tokenize_line($line);

See the description of the PHP::Decode::Tokenizer module for the token types.

read_code

Parse a token list

<tok> php token list

  $stmt = $parser->read_code($tokens);

The read_code method converts the token list to an internal representation of php statements. If a script contains more than one top level statement, the method returns block with a list of these statements.

The following types are used to represent php statements:

#null: null value
#num: unquoted integer or float
#str: quoted string
#const: unquoted symbol
$var: variable
#arr: ordered array (see: PHP::Decode::Array)
#blk: block of statements
#fun: function definition
#call: function call
#elem: indexed elem access
#expr: unary or binary expression
#ns: namespace prefix
#class: class definition
#scope: class property dereference
#inst: class instance
#obj: obj property dereference
#ref: reference to variable
#trait: class trait
#fh: file handle (limited to __FILE__)
#stmt: remaining php statements (like if, while, echo, global, ..)

Each of of these statements is uniquely numbered and stored in the strmap of the parser.

format_stmt

Format a php statement to a php code string.

$code = $parser->format_stmt($stmt, $fmt);

The accepted arguments are:

stmt: the toplevel php statement to format
fmt: optional format flags
$fmt->{indent}: output indented multiline code
$fmt->{unified}: unified #str/#num output
$fmt->{mask_eval}: mask eval in strings with pattern
$fmt->{escape_ctrl}: escape control characters in output strings
$fmt->{avoid_semicolon}: avoid semicolons after braces
$fmt->{max_strlen}: max length for strings in output

SEE ALSO

Requires the PHP::Decode::Tokenizer Module.

AUTHORS

Barnim Dzwillo @ Strato AG