NAME
PDF::Extract - Extracting sub PDF documents from a multi page PDF document
SYNOPSIS
use PDF::Extract;
$pdf=new PDF::Extract;
$pdf->servePDFExtract( PDFDoc=>"c:/Docs/my.pdf", PDFPages=>"1-3 31-36" );
or
use PDF::Extract;
$pdf = new PDF::Extract( PDFDoc=>'C:/my.pdf' );
$pdf->getPDFExtract( PDFPages=>@PDFPages );
print "Content-Type text/plain\n\n<xmp>", $pdf->getVars("PDFExtract");
print $pdf->getVars("PDFError");
DESCRIPTION
PDF Extract is a group of methods that allow the user to quickly grab pages as a new PDF document from a pre-existing PDF document.
With PDF::Extract a new PDF document can be:-
assigned to a scalar variable with getPDFExtract.
saved to disk with savePDFExtract.
printed to STDOUT as a PDF web document with servePDFExtract.
cached and served for a faster PDF web document service with fastServePDFExtract.
These four main methods can be called with or without arguments. The methods will not work unless they know the location of the original PDF document and the pages to extract. There are no default values.
There are four other methods that deal with setting and getting the public variables.
getPDFExtractVariables can return an array of variables.
getVars is an alias of getPDFExtractVariables
setPDFExtractVariables can set the public variables.
setVars is an alias of setPDFExtractVariables
METHODS
new PDF::Extract
Creates a new Extract object with empty state information ready for processing data both input and output. New can be called with a hash array argument.
new PDF::Extract( PDFDoc=>"c:/Docs/my.pdf", PDFPages=>"1-3 31-36" )
This will cause a new PDF document to be generated unless there is an error. Extract->new() simply calls getPDFExtract() if there is an argument.
getPDFExtract
This method is the main workhorse of the package. It does all the PDF processing and sets PDFError if its unable to create a new PDF document. It requires PDFDoc and PDFPages to be set either in this call of before to function. It outputs a PDF document as a string or a "0" if there is an error.
To create an array of PDF documents, each consisting of a single page, from a multi page PDF document.
$pdf = new PDF::Extract( PDFDoc=>'C:/my.pdf' );
while ( $pdf[$i]=$pdf->getPDFExtract( PDFPages=>++$i ) );
The lowest valid page number for PDFPages is 1. A value of 0 will produce no output and raise an error. An error will be raised if the PDFPages value does not correspond to any pages.
savePDFExtract
This method saves its output to the directory defined for PDFCache. The new PDF's filename will be an amalgam of the original filename, the requested page numbers separated with an underscore "_" for individual pages, ".." for a range of pages and the .pdf file type suffix.
$pdf->savePDFExtract(PDFPages=>"1 3-5", PDFDoc=>'C:/my.pdf', PDFCache=>"C:/myCache" );
If there is an error then an error page will be served and savePDFExtract will return a "0". Otherwise savePDFExtract will return "1" and the saved PDF location and file name will be "C:/myCache/my1_3..5.pdf".
servePDFExtract
This method serves its output to STDOUT with the correct header for a PDF document served on the web.
$pdf = PDF::Extract->new(
PDFDoc=>'C:/my.pdf',
PDFErrorPage=>"C:/myErrorPage.html" );
$pdf->servePDFExtract( PDFPages=>1);
If there is an error then an error page will be served and servePDFExtract will return "0". Otherwise servePDFExtract will return "1"
fastServePDFExtract
This method serves its output to STDOUT with the correct header for a PDF document served on the web.
This method checks to see if the PDF document requested is in the cache folder, as set with PDFCache. If file exists then the file in the cache folder is served instead of processing a new PDF document. If there is an error then an error page will be served and fastServePDFExtract will return "0". fastServePDFExtract will return "1" on success.
$pdf->setVars(
PDFDoc=>'C:/my.pdf',
PDFCache=>"C:/myCache",
PDFErrorPage=>"C:/myErrorPage.html",
PDFPages=>1);
unless ($pdf->fastServePDFExtract ) {
# there was an error
$error=$pdf->getVars("PDFError") ;
}
getPDFExtractVariables
Get any of the public variables using a list of the variables to get
($error, @found)=$pdf->getPDFExtractVariables( "PDFError", "PDFPagesFound");
This method returns an an array of variables coresponding to the named variables passed in as arguments. If a variable is undefined then its returned value will be undefined.
getVars
This methos is an alias for getPDFExtractVariables. Get any of the public variables using a list of the variables to get
@vars=$pdf->getVars( @varNames );
This method returns an an array of variables coresponding to the named variables passed in as arguments. If a variable is undefined then its returned value will be undefined.
setPDFExtractVariables
Set any of the public variables using a hash of the variables and their values.
($doc,$pages)=$pdf->setPDFExtractVariables(PDFDoc=>'C:/my.pdf', PDFPages=>1);
This method sets the variables specified in the argument hash. They return an array of the new values set.
setVars
This methos is an alias for setPDFExtractVariables. Set any of the public variables using a hash of the variables and their values.
@vars=$pdf->setVars( %vars );
This method sets the variables specified in the argument hash. They return an array of the new values set.
VARIABLES
PDFDoc (set and get)
$file=$pdf->getVars("PDFDoc");
This variable contains the path to the last original PDF document accessed by getPDFExtract, savePDFExtract, servePDFExtract and fastServePDFExtract. PDFDoc will be an empty string if there was an error.
PDFPages (set and get)
$pages=$pdf->setVars("PDFPages"=>"1 18-23");
or
$pages=$pdf->getVars("PDFPages");
This variable contains a list of pages to extract from the original PDF document accessed by getPDFExtract, savePDFExtract, servePDFExtract and fastServePDFExtract.
PDFCache (set and get)
$cachePath=$pdf->setVars("PDFCache"=>"C:/myCache");
or
$cachePath=$pdf->getVars("PDFCache");
This variable contains the path to the PDF document cache. This value is required by savePDFExtract and fastServePDFExtract method calls. PDFCache will be an empty string if there was an error in setting the value.
PDFErrorPage (set and get)
$errorPagePath=$pdf->setVars("PDFErrorPage"=>"C:/myError.html");
or
$errorPagePath=$pdf->getVars("PDFErrorPage");
PDFErrorPage is a text file that can be used as a template for the error page. If the PDFErrorPage contains [PDFError], the word PDFError surrounded by square brackets, then the error description will replace [PDFError]. Otherwise you can devise a generic error description and describe remedial actions to be taken by the viewer.
If this variable is not set then a default error page will be used. The default page has a message in red at the top, "There is system problem in processing your PDF Pages request.", and then a description of the actual error follows underneath in black.
PDFExtract (get only)
$out=$pdf->getVars("PDFExtract");
This variable contains the last PDF document processed by getPDFExtract, savePDFExtract, servePDFExtract and fastServePDFExtract. PDFExtract will be an empty string if there was an error.
PDFPagesFound (get only)
@pagesFound=$pdf->getVars("PDFPagesFound");
or
$pageCount=$pdf->getVars("PDFPagesFound");
This variable contains an array of the page numbers that were selected and found within the original PDF document. PDFPagesFound will be a undefined if there was an error in finding any pages.
Note: This variable must be last in a list of variables to get as it returns an array.
($doc, @pagesFound)=$pdf->getVars("PDFDoc", "PDFPagesFound");
PDFPageCount (get only)
$pageCount=$pdf->getVars("PDFPageCount");
This variable contains the number of the pages that were selected and found within the original PDF document. PDFPageCount will be an empty string if there was an error in finding any pages.
PDFError (get only)
$error=$pdf->getVars("PDFError");
This variable contains a string describing the errors if any in processing the original PDF file. PDFError is guarenteed to be set if getPDFExtract, savePDFExtract, servePDFExtract or fastServePDFExtract fail and return a "0". PDFError will be an empty string if there was no error.
AUTHOR
Noel Sharrock <mailto:nsharrok@lgmedia.com.au>
PDF::Extract's home page http://www.lgmedia.com.au/PDF/Extract.asp
SUPPORT
Much thanks to Lyman Byrd for his welcome programming suggestions and editorial comments on the POD.
COPYRIGHT
Copyright (c) 2003 by Noel Sharrock. All rights reserved.
LICENSE
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the ``Artistic License'' or the ``GNU General Public License''.
The C library at the core of this Perl module can additionally be redistributed and/or modified under the terms of the ``GNU Library General Public License''.
DISCLAIMER
This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the ``GNU General Public License'' for more details.
PDF::Extract - Extracting sub PDF documents from a multipage PDF document