PDF::Data - Manipulate PDF files and objects as data structures


version v1.0.0


  use PDF::Data;


This module can read and write PDF files, and represents PDF objects as data structures that can be readily manipulated.



  my $pdf = PDF::Data->new(-compress => 1, -minify => 1);

Constructor to create an empty PDF::Data object instance. Any arguments passed to the constructor are treated as key/value pairs, and included in the $pdf hash object returned from the constructor. When the PDF file data is generated, this hash is written to the PDF file as the trailer dictionary. However, hash keys starting with "-" are ignored when writing the PDF file, as they are considered to be flags or metadata.

For example, $pdf->{-compress} is a flag which controls whether or not streams will be compressed when generating PDF file data. This flag can be set in the constructor (as shown above), or set directly on the object.

The $pdf->{-minify} flag controls whether or not to save space in the generated PDF file data by removing comments and extra whitespace from content streams. This flag can be used along with $pdf->{-compress} to make the generated PDF file data even smaller, but this transformation is not reversible.


  my $pdf_clone = $pdf->clone;

Deep copy the entire PDF::Data object itself.


  my $page = $pdf->new_page(8.5, 11);

Create a new page object with the specified size.


  my $copied_page = $pdf->copy_page($page);

Deep copy a single page object.


  $page = $pdf->append_page($page);

Append the specified page object to the end of the PDF page tree.


  my $pdf = PDF::Data->read_pdf($file, %args);

Read a PDF file and parse it with $pdf->parse_pdf(), returning a new object instance. Any streams compressed with the /FlateDecode filter will be automatically decompressed. Unless the $pdf->{-decompress} flag is set, the same streams will also be automatically recompressed again when generating PDF file data.


  my $pdf = PDF::Data->parse_pdf($data, %args);

Used by $pdf->read_pdf() to parse the raw PDF file data and create a new object instance. This method can also be called directly instead of calling $pdf->read_pdf() if the PDF file data comes another source instead of a regular file.


  $pdf->write_pdf($file, $time);

Generate and write a new PDF file from the current state of the PDF data.

The $time parameter is optional; if not defined, it defaults to the current time. If $time is defined but false (zero or empty string), no timestamp will be set.

The optional $time parameter may be used to specify the modification timestamp to save in the PDF metadata and to set the file modification timestamp of the output file. If not specified, it defaults to the current time. If a false value is specified, this method will skip setting the modification time in the PDF metadata, and skip setting the timestamp on the output file.


  my $pdf_file_data = $document->pdf_file_data($time);

Generate PDF file data from the current state of the PDF data structure, suitable for writing to an output PDF file. This method is used by the write_pdf() method to generate the raw string of bytes to be written to the output PDF file. This data can be directly used (e.g. as a MIME attachment) without the need to actually write a PDF file to disk.

The optional $time parameter may be used to specify the modification timestamp to save in the PDF metadata. If not specified, it defaults to the current time. If a false value is specified, this method will skip setting the modification time in the PDF metadata.



Dump the PDF internal structure and data for debugging.



Dump an outline of the PDF internal structure for debugging.


  my $stream = $pdf->merge_content_streams($array_of_streams);

Merge multiple content streams into a single content stream.



Find bounding box by analyzing a content stream. This is only partially implemented.


  $new_content = $pdf->new_bbox($content_stream);

Find bounding box by analyzing a content stream. This is only partially implemented.


  my $timestamp = $pdf->timestamp($time);
  my $now       = $pdf->timestamp;

Generate timestamp in PDF internal format.



  my @numbers = $pdf->round(@numbers);

Round numeric values to 12 significant digits to avoid floating-point rounding error and remove trailing zeroes.


  my $matrix = $pdf->concat_matrix($transformation_matrix, $original_matrix);

Concatenate a transformation matrix with an original matrix, returning a new matrix. This is for arrays of 6 elements representing standard 3x3 transformation matrices as used by PostScript and PDF.


  my $inverse = $pdf->invert_matrix($matrix);

Calculate the inverse of a matrix, if possible. Returns undef if not invertible.


  my $matrix = $pdf->translate($x, $y);

Returns a 6-element transformation matrix representing translation of the origin to the specified coordinates.


  my $matrix = $pdf->scale($x, $y);

Returns a 6-element transformation matrix representing scaling of the coordinate space by the specified horizontal and vertical scaling factors.


  my $matrix = $pdf->rotate($angle);

Returns a 6-element transformation matrix representing counterclockwise rotation of the coordinate system by the specified angle (in degrees).




Used by new(), parse_pdf() and write_pdf() to validate some parts of the PDF structure.


  $pdf->validate_key($hash, $key, $value, $label);

Used by validate() to validate specific hash key values.


  my $hash = $pdf->get_hash_node($path);

Used by validate_key() to get a hash node from the PDF structure by path.


  my @objects = $pdf->parse_objects($objects, $data, $offset);

Used by parse_pdf() to parse PDF objects into Perl representations.


  my @objects = $pdf->parse_data($data);

Uses parse_objects() to parse PDF objects from standalone PDF data.



Used by parse_objects() to inflate compressed streams.


  $new_stream = $pdf->compress_stream($stream);

Used by write_object() to compress streams if enabled. This is controlled by the $pdf->{-compress} flag, which is set automatically when reading a PDF file with compressed streams, but must be set manually for PDF files created from scratch, either in the constructor arguments or after the fact.


  $object = $pdf->resolve_references($objects, $object);

Used by parse_pdf() to replace parsed indirect object references with direct references to the objects in question.


  my $xrefs = $pdf->write_indirect_objects($pdf_file_data, $objects, $seen);

Used by write_pdf() to write all indirect objects to a string of new PDF file data.



Used by write_indirect_objects() to identify which objects in the PDF data structure need to be indirect objects.


  $pdf->enumerate_shared_objects($objects, $seen, $ancestors, $object);

Used by enumerate_indirect_objects() to find objects which are already shared (referenced from multiple objects in the PDF data structure).


  $pdf->add_indirect_objects($objects, @objects);

Used by enumerate_indirect_objects() and enumerate_shared_objects() to add objects to the list of indirect objects to be written out.


  $pdf->write_object($pdf_file_data, $objects, $seen, $object, $indent);

Used by write_indirect_objects(), and called by itself recursively, to write direct objects out to the string of new PDF file data.


  my $output = $pdf->dump_object($object, $label, $seen, $indent, $mode);

Used by dump_pdf(), and called by itself recursively, to dump/outline the specified PDF object.