NAME

CAM::PDFTaxforms - CAM::PDF wrapper to also allow editing of checkboxes (ie. for IRS Tax forms).

AUTHOR

Jim Turner <https://metacpan.org/author/TURNERJW>.

This module is a wrapper around and a drop-in replacement for CAM::PDF, by Chris Dolan.

ACKNOWLEDGMENTS

Thanks to Chris Dolan and everyone involved in developing and supporting CAM::PDF, on which this module is based and relies on.

LICENSE AND COPYRIGHT

Copyright (c) 2010-2019 Jim Turner <mailto:turnerjw784@yahoo.com>

This library is free software; you can redistribute it and/or modify it under the same terms as CAM::PDF and Perl itself.

CAM::PDF:

Copyright (c) 2002-2006 Clotho Advanced Media, Inc., http://www.clotho.com/

Copyright (c) 2007-2008 Chris Dolan

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SYNOPSIS

#!/usr/bin/perl -w

use strict;
use CAM::PDFTaxforms;
my $pdf = CAM::PDFTaxforms->new('f1040.pdf') or die "Could not open PDF ($!)!";    
my $page1 = $pdf->getPageContent(1);

#DISPLAY THE LIST NAMES OF EDITABLE FIELDS:
my @fieldnames = $pdf->getFormFieldList();
print "--fields=".join('|',@fieldnames)."=\n";

#UPDATE THE VALUES OF ONE OF THE FIELDS AND A COUPLE OF THE CHECKBOXES:
$pdf->fillFormFields('fieldname1' => 'value1', 'fieldname2' => 'value2');

#WRITE THE UPDATED PDF FORM TO A NEW FILE NAME:
$pdf->cleanoutput('f1040_completed.pdf');

Many example programs are included in this distribution to do useful tasks. See the bin subdirectory.

DESCRIPTION

This package is a wrapper for and creates a CAM::PDF object. The difference is that some method functions are overridden to fix some issues and add some new features, namely to better handle IRS tax forms, many of which have checkboxes, in addition to numeric and text fields. Several other patches have also been applied, particularly those provided by CAM::PDF bugs #58144, #122890 and #125299. Otherwise, it should work well as a full drop-in replacement for CAM::PDF in the API.

CAM::PDF description:

This package reads and writes any document that conforms to the PDF specification generously provided by Adobe at http://partners.adobe.com/public/developer/pdf/index_reference.html (link last checked Oct 2005).

The file format through PDF 1.5 is well-supported, with the exception of the "linearized" or "optimized" output format, which this module can read but not write. Many specific aspects of the document model are not manipulable with this package (like fonts), but if the input document is correctly written, then this module will preserve the model integrity.

The PDF writing feature saves as PDF 1.4-compatible. That means that we cannot write compressed object streams. The consequence is that reading and then writing a PDF 1.5+ document may enlarge the resulting file by a fair margin.

This library grants you some power over the PDF security model. Note that applications editing PDF documents via this library MUST respect the security preferences of the document. Any violation of this respect is contrary to Adobe's intellectual property position, as stated in the reference manual at the above URL.

Technical detail regarding corrupt PDFs: This library adheres strictly to the PDF specification. Adobe's Acrobat Reader is more lenient, allowing some corrupted PDFs to be viewable. Therefore, it is possible that some PDFs may be readable by Acrobat that are illegible to this library. In particular, files which have had line endings converted to or from DOS/Windows style (i.e. CR-NL) may be rendered unusable even though Acrobat does not complain. Future library versions may relax the parser, but not yet.

This version is HACKED by Jim Turner 09/2010 to enable the fillFormFields() function to also modify checkboxes (primarily on IRS Tax forms).

EXAMPLES

See the example/ subdirectory in the source tree. There is a sample blank 2018 official IRS Schedule B tax form and two programs: dof1040sb.pl, which fills in the form using the sample input data text file f1040sb_inputs.txt, and creates a filled in version of the form called f1040sb_out.pdf. The other program (test1040sb.pl) can read the data filled in the filled in form created by the other program and displays it as output.

To run the programs, switch to the example/ subdirectory in the source tree and run them without arguments (ie. ./dof1040sb.pl).

To see the names of the fields and their current values in a PDF form, such as the aforementioned tax form, run the included program, ie: listpdffields2.pl -d f1040sb_out.pdf.

API

Functions intended to be used externally

$self = CAM::PDFTaxforms->new(content | filename | '-')
$self->toPDF()
$self->needsSave()
$self->save()
$self->cleansave()
$self->output(filename | '-')
$self->cleanoutput(filename | '-')
$self->previousRevision()
$self->allRevisions()
$self->preserveOrder()
$self->appendObject(olddoc, oldnum, [follow=(1|0)])
$self->replaceObject(newnum, olddoc, oldnum, [follow=(1|0)])
   (olddoc can be undef in the above for adding new objects)
$self->numPages()
$self->getPageText(pagenum)
$self->getPageDimensions(pagenum)
$self->getPageContent(pagenum)
$self->setPageContent(pagenum, content)
$self->appendPageContent(pagenum, content)
$self->deletePage(pagenum)
$self->deletePages(pagenum, pagenum, ...)
$self->extractPages(pagenum, pagenum, ...)
$self->appendPDF(CAM::PDF object)
$self->prependPDF(CAM::PDF object)
$self->wrapString(string, width, fontsize, page, fontlabel)
$self->getFontNames(pagenum)
$self->addFont(page, fontname, fontlabel, [fontmetrics])
$self->deEmbedFont(page, fontname, [newfontname])
$self->deEmbedFontByBaseName(page, basename, [newfont])
$self->getPrefs()
$self->setPrefs()
$self->canPrint()
$self->canModify()
$self->canCopy()
$self->canAdd()
$self->getFormFieldList()
$self->fillFormFields(fieldname, value, [fieldname, value, ...])
  or $self->fillFormFields(%values)
$self->clearFormFieldTriggers(fieldname, fieldname, ...)

Note: 'clean' as in cleansave() and cleanobject() means write a fresh PDF document. The alternative (e.g. save()) reuses the existing doc and just appends to it. Also note that 'clean' functions sort the objects numerically. If you prefer that the new PDF docs more closely resemble the old ones, call preserveOrder() before cleansave() or cleanobject().

For additional methods and functions, see the CAM::PDF documentation.

METHODS

$doc = CAM::PDFTaxforms->new($content)
$doc = CAM::PDFTaxforms->new($ownerpass, $userpass)
$doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $prompt)
$doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $options)

Instantiate a new CAM::PDFTaxforms object. $content can be a document in a string, a filename, or '-'. The latter indicates that the document should be read from standard input. If the document is password protected, the passwords should be passed as additional arguments. If they are not known, a boolean $prompt argument allows the programmer to suggest that the constructor prompt the user for a password. This is rudimentary prompting: passwords are in the clear on the console.

This constructor takes an optional final argument which is a hash reference. This hash can contain any of the following optional parameters:

prompt_for_password => $boolean

This is the same as the $prompt argument described above.

fault_tolerant => $boolean

This flag causes the instance to be more lenient when reading the input PDF. Currently, this only affects PDFs which cannot be successfully decrypted.

$hashref = $doc->getFieldValue('fieldname1' [, fieldname2, ... fieldnameN ])

(CAM::PDFTaxforms only, not available in CAM::PDF)

Fetches the corresponding current values for each field name in the argument list. Returns a reference to a hash containing the field names as keys and the corresponding values. If a field does not exist or does not contain a value, an empty string is returned in the hash as it's value. If called in array / hash context, then a list of field names and values in the order (fieldname1, value1, fieldname2, value2, ... fieldnameN valueN) is returned.

$doc->fillFormFields($name => $value, ...)
$doc->fillFormFields($opts_hash, $name => $value, ...)

Set the default values of PDF form fields. The name should be the full hierarchical name of the field as output by the getFormFieldList() function. The argument list can be a hash if you like. A simple way to use this function is something like this:

my %fields = (fname => 'John', lname => 'Smith', state => 'WI');
$field{zip} = 53703;
$self->fillFormFields(%fields);

NOTE: For checkbox fields specify any value that is false in Perl (ie. 0, '', or undef), or any of the strings: 'Off', 'No', or 'Unchecked' (case insensitive) to un-check a checkbox, or any other value that is true in Perl to check it. Checkbox fields are only supported by CAM::PDFTaxforms and was the original reason for creating it.

If the first argument is a hash reference, it is interpreted as options for how to render the filled data:

background_color =< 'none' | $gray | [$r, $g, $b]

Specify the background color for the text field.

$doc->getFormFieldList()

Return an array of the names of all of the PDF form fields. The names are the full hierarchical names constructed as explained in the PDF reference manual. These names are useful for the fillFormFields() function.

$doc->getFormField($name)

For INTERNAL use

Return the object containing the form field definition for the specified field name. $name can be either the full name or the "short/alternate" name.

$doc->writeAny($node)

Returns the serialization of the specified node. This handles all Node types, including object Nodes.

SCRIPTS

CAM::PDF includes a number of handy utility scripts, installed in the users local/bin path, but we add a modified version of their listpdffields.pl utility that is called listpdffields2.pl which adds a -d (--data) option for displaying the names of all the fields found in a PDF form, along with their corresponding current values (if any).

listpdffiles2.pl [-dhsvV] pdfformfile.pdf

The general format is:

listpdffiles2.pl -d pdfformfile.pdf

COMPATIBILITY

This library was primarily developed against the 3rd edition of the reference (PDF v1.4) with several important updates from 4th edition (PDF v1.5). This library focuses most deeply on PDF v1.2 features. Nonetheless, it should be forward and backward compatible in the majority of cases.

PERFORMANCE

This module is written with good speed and flexibility in mind, often at the expense of memory consumption. Entire PDF documents are typically slurped into RAM. As an example, simply calling new('PDFReference15_v15.pdf') (the 13.5 MB Adobe PDF Reference V1.5 document) pushes Perl to consume 89 MB of RAM on my development machine.

DEPENDS

CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5

KEYWORDS

pdf taxforms

KNOWN BUGS / TODO

1) Checkboxes / radio buttons set programatically to "CHECKED" by CAM::PDFTaxforms ARE checked, and shown as so in the form, but evince, and perhaps Acrobat(tm) form editor don't seem to consider them checked the first time a user clicks on them to uncheck them, requiring a second click. This can be especially disconcerting to the user for radio-buttons as it is possible to click a second button in the group checking it, but the originally-checked button is NOT automatically unchecked. I need to somehow FIX this, but have so far been unable to do so (as of v1.1 - sorry!), so please don't file a bug on this UNLESS you have a PATCH for either me OR CAM::PDF itself!

2) CAM::PDF is used under the hood for most of the actual work, and has many open bugs / issues (see: https://rt.cpan.org/Public/Dist/Display.html?Name=CAM-PDF), so, except for the patched ones mentioned in the DESCRIPTION section above, those issues remain unfixed here as well! Therefore, check if your issue works if using standard CAM::PDF first before filing a new bug here (or unless it involves a specific CAM::PDFTextforms feature, or you have a patch, in which case you're likely to get it merged here sooner!).

SEE ALSO

CAM::PDF (Obviously) as this module is a wrapper around it (and requires it as a prerequisite). Also see the docs there for all the other methods and features available to CAM::PDFTaxforms (it's NOT just for IRS tax forms)!

There are several other PDF modules on CPAN. Below is a brief description of a few of them. If these comments are out of date, please inform me.

PDF::API2

As of v0.46.003, LGPL license.

This is the leading PDF library, in my opinion.

Excellent text and font support. This is the highest level library of the bunch, and is the most complete implementation of the Adobe PDF spec. The author is amazingly responsive and patient.

Text::PDF

As of v0.25, Artistic license.

Excellent compression support (CAM::PDF cribs off this Text::PDF feature). This has not been developed since 2003.

PDF::Reuse

As of v0.32, Artistic/GPL license, like Perl itself.

This library is not object oriented, so it can only process one PDF at a time, while storing all data in global variables. I'm not fond of it, but it's quite popular, so don't take my word for it!

Additionally, PDFLib is a commercial package not on CPAN (www.pdflib.com). It is a C-based library with a Perl interface. It is designed for PDF creation, not for reuse.