NAME
Fuse::PDF - Filesystem embedded in a PDF document
SYNOPSIS
use Fuse::PDF;
my $fs = Fuse::PDF->new('my_doc.pdf');
$fs->mount('/mnt/pdf');
# blocks until the filesystem is unmounted
See also the mount_pdf front-end.
LICENSE
Copyright 2007-2008 Chris Dolan, cdolan@cpan.org
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
DESCRIPTION
The Adobe Portable Document Format is an arbitrary collection of nodes which support a tree structure. Most of that data is oriented toward document rendering, but it's legal to add arbitrarily complex data virtually anywhere in the document structure. Adobe Illustrator does this to embed lots of metadata in it's "PDF-compatible" Illustrator document format.
By deciding on a convention for representing a filesystem data and leveraging the FUSE (Filesystem in Userspace) library, we map filesystem calls to PDF edits.
More info: http://www.chrisdolan.net/madmongers/par-fuse-pdf.html
BUGS AND CAVEATS
PDF-in-PDF: If you copy another PDF into the PDF-based filesystem, it may corrupt the outer document. This should be solved when I switch to saving file contents in PDF streams instead of in PDF strings.
Saving: No data is saved until you unmount the filesystem! Hopefully I can fix this in future releases. The saving is not yet atomic. That is, if you have a failure, the old PDF may be deleted before the new one is saved.
Resources: The entire PDF is loaded into RAM in new()
. If your filesystem grows too large, this will lead to obvious problems!
Hangs: While FUSE is quite mature, I found it to be fairly easy to hang the filesystem back around 0.01. I only needed to actually reboot once, but if that causes you concern you may wish to avoid FUSE in general. This has not been a problem since the earliest releases.
Operating systems: I've only tested this software with the Google build of MacFUSE 1.1.0 (PowerPC, 10.4, http://code.google.com/p/macfuse/). Notably, I have not tried the Linux implementation of FUSE. If you have other experiences to add, email me or post comments to http://annocpan.org/.
Fuse.pm: As of this writing, the Fuse module (v0.09_01) fails all tests on Mac. The module actually works great, but the Makefile.PL and the tests are very Linux-centric. Hopefully that will improve as MacFUSE matures.
PDF versions: This package relies on CAM::PDF to read and write PDFs. While that module supports all of the core PDF syntax, it's stricter than many other PDF implementations and may fail to open PDFs that, say, Acrobat or Preview.app can open. In particular, "Print to PDF" on Mac OS X 10.4 often generates bad PDFs.
Threading: I've explicitly set the FUSE default to single-threaded mode, so performance may be terrible in some scenarios. I hope to add support for threaded Perl in a future release. Patches welcome (remove the threaded => 0
line from this file and add locking to Fuse::PDF::FS).
Unsupported: special files (named pipes, etc.), following symlinks out of the filesystem, permission enforcement, chown
, flush
, reading from unlinked filehandles.
Hard links: I have not yet implemented hard links. I'll implement compressed streams at the same time.
METHODS
- $pkg->new($pdf_filename)
- $pkg->new($pdf_filename, $hash_of_options)
-
Create a new filesystem instance. This method opens and parses the PDF document. If there is an error opening or parsing the PDF document, this will return
undef
.The options hash supports the following extra arguments:
- pdf_constructor
-
An arrayref of extra arguments to pass to the CAM::PDF constructor. In particular, the first arguments are the owner and user password which can be used to open encrypted PDFs.
- save_filename
-
The string representing the path where filesystem changes should be saved. By default this is the
$pdf_filename
passed tonew()
. - compact
-
A boolean indicating whether to discard old filesystem data saved via version infrastructure described in the PDF specification. Defaults to false. If left false, then the PDF will grow with every mount, but only by as much as you changed it. See
rewritepdf.pl --cleanse
from the CAM::PDF distribution to perform the compaction manually. See alsorevertpdf.pl
to roll back to those older versions. - fs_name
-
Fuse::PDF can embed multiple filesystems in a single PDF distinguished by name. This string specifies which filesystem to use. It uses the Fuse::PDF::FS default if a name is not explicitly provided.
- revision
-
A version number indicating which filesystem version to roll back to before mounting. Use
fs()
and the Fuse::PDF::FS API to learn what revisions are available in a PDF filesystem.
- $self->mount($mount_path)
- $self->mount($mount_path, $hash_of_fuse_options)
-
Calls into Fuse to mount the filesystem to the specified mount point. On unmount, a new PDF will be saved with any filesystem changes.
If the mount point does not exist, this package will try to create it as a directory via a simple
mkdir()
. If themkdir()
fails, or if the mount point exists but is not a directory,mount()
willcroak()
.If the PDF has an existing filesystem which is incompatible with this version of the software,
mount()
willcroak()
.If the mount is successful, we establish callbacks and hand control to the FUSE library. FUSE blocks until the filesystem is unmounted. If this blocking is a problem for you, consider daemonizing the process like so:
use Fuse::PDF; use Net::Server::Daemonize qw(); my ($pdffilename, $mountdir) = @ARGV; my $fs = Fuse::PDF->new($pdffilename); if (Net::Server::Daemonize::safe_fork()) { exit 0; # parent process or failure } Net::Server::Daemonize::daemonize('www', 'www'); $fs->mount($mountdir);
The
mount()
method cleans up after itself sufficiently that you may call it again immediately after unmounting.The options hash is passed directly to
Fuse::main()
. See the Fuse documentation for the allowed keys. A simple example is:$fs->mount($mountdir, {debug => 1});
- $self->fs()
-
Return a fresh copy of the Fuse::PDF::FS data structure representing this PDF. You should not try to manipulate this object while the filesystem is mounted. This module is not yet thread-safe!
SEE ALSO
mount_pdf
AUTHOR
Chris Dolan, cdolan@cpan.org
CREDITS
Thanks to the Madison Perl Mongers for thinking the idea was stupid enough that I was inspired to implement it!