NAME

Fuse::PDF::ContentFS - Represent actual PDF document properties as files

SYNOPSIS

use Fuse::PDF::ContentFS;
my $fs = Fuse::PDF::ContentFS->new({pdf => CAM::PDF->new('my_doc.pdf')});
$fs->fs_read('/');

or

% mount_pdf --all my_doc.pdf /Volumes/my_doc_pdf
% cd /Volumes/my_doc_pdf
% ls
filesystems  metadata  pages  revisions
% ls metadata/
CreationDate  Creator  ID  ModDate  Producer
% cat metadata/Producer
Adobe PDF library 5.00
% ls pages
1
% ls pages/1
fonts  images  layout.txt  text
% ls pages/1/text
formatted_text.txt  plain_text.txt      
% cat pages/1/text/plain_text.txt 
F u s e : : P D F  -  E m b e d  a  f i l e s y s t e m  i n  a  P D F  d o c u 
m e n t
C h r i s  D o l a n  < c d o l a n @ c p a n . o r g >
T o  g e t  s o f t w a r e  t h a t  c a n  i n t e r a c t  w i t h  t h i s  
f i l e s y s t e m ,  s e e
h t t p : / / s e a r c h . c p a n . o r g / d i s t / F u s e - P D F /
% cat pages/1/fonts/TT0/BaseFont 
HISDQN+Helvetica
% ls pages/1/images/
1.pdf  2.pdf  3.pdf  4.pdf
% open pages/1/images/1.pdf
% cd /
% umount /Volumes/my_doc_pdf

LICENSE

Copyright 2007-2008 Chris Dolan, cdolan@cpan.org

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

DESCRIPTION

This is a read-only filesystem that represents the metadata of a PDF document as a filesystem. The metadata that are available are the ones that I've explicitly coded for. Much more is possible.

FILESYSTEM STRUCTURE

/pages/<num>                      - one folder per page of the document; count from 1
/pages/<num>/fonts/<ID>           - one folder per referenced font, e.g. 'TT0'
/pages/<num>/fonts/<ID>/Type      - always 'Font'
/pages/<num>/fonts/<ID>/Subtype   - e.g. 'TrueType'
/pages/<num>/fonts/<ID>/BaseFont  - name of the font, e.g. 'Helvetica'
/pages/<num>/fonts/<ID>/FirstChar - ordinal of the first available glyph
/pages/<num>/fonts/<ID>/LastChar  - ordinal of the last available glyph
/pages/<num>/layout.txt           - raw PDF markup for a page
/pages/<num>/text/plain_text.txt  - strings extracted from the page (rough!)
/pages/<num>/text/formatted_text.txt - very rough text rendering of the page
/pages/<num>/images/<num>.pdf     - images used in the page, wrapped in a minimal PDF
/metadata/                        - one file for every metadata key/value in the root dict
/metadata/ID                      - hexadecimal ID, hopefully unique
/metadata/Author                  - usually the author's username; depends on authoring tool
/metadata/Creator                 - name of generating application
/metadata/Producer                - name of generating application
/metadata/CreationDate            - e.g. D:20080104091746-06'00'
/metadata/ModDate                 - date last modified (usually the same as the CreationDate)
/filesystems/<name>/              - any embedded filesystems created by Fuse::PDF
/revisions/<num>                  - look at older versions of annotated PDFs

METHODS

$pkg->new($hash_of_options)

Create a new filesystem instance. The only required option is the pdf key, like so:

my $fs = Fuse::PDF::ContentFS->new({pdf => CAM::PDF->new('file.pdf')});

All other options are currently unused, although they are passed to Fuse::PDF::FS instances created for the /filesystem folder.

$self->all_revisions()

Return a list of one instance for each revision of the PDF. The first item on the list is this instance (the newest) and the last item on the list is the first revision of the PDF (the oldest). Unedited PDFs (the most common) will return just a one-element list.

$self->previous_revision()

If there is an older version of the PDF, extract that and return a new Fuse::PDF::ContentFS instance which applies to that revision. Multiple versions is feature supported by the PDF specification, so this action is consistent with other PDF revision editing tools.

If there are no previous revisions, this will return undef.

$self->statistics()

Return a hashref with some global information about the filesystem.

$self->to_string()

Return a human-readable representation of the statistics for each revision of the filesystem.

FUSE-COMPATIBLE METHODS

The following methods are independent of Fuse, but uses almost the exact same API expected by that package (except for fs_setxattr), so they can easily be converted to a FUSE implementation.

$self->fs_getattr($file)
$self->fs_readlink($file)
$self->fs_getdir($file)
$self->fs_mknod($file, $modes, $dev)
$self->fs_mkdir($file, $perms)
$self->fs_unlink($file)
$self->fs_rmdir($file)
$self->fs_symlink($link, $file)
$self->fs_rename($oldfile, $file)
$self->fs_link($srcfile, $file)
$self->fs_chmod($file, $perms)
$self->fs_chown($file, $uid, $gid)
$self->fs_truncate($file, $length)
$self->fs_utime($file, $atime, $utime)
$self->fs_open($file, $mode)
$self->fs_read($file, $size, $offset)
$self->fs_write($file, $str, $offset)
$self->fs_statfs()
$self->fs_flush($file)
$self->fs_release($file, $mode)
$self->fs_fsync($file, $flags)
$self->fs_setxattr($file, $key, $value, \%flags)
$self->fs_getxattr($file, $key)
$self->fs_listxattr($file)
$self->fs_removexattr($file, $key)

PASS-THROUGH METHODS

These methods exist only to pass parameters through to Fuse::PDF::FS via the /filesystem/* sub-filesystems. See the methods of the same name in that module.

$self->autosave_filename()
$self->autosave_filename($filename)
$self->compact()
$self->compact($boolean)
$self->backup()
$self->backup($boolean)

SEE ALSO

Fuse::PDF

CAM::PDF

AUTHOR

Chris Dolan, cdolan@cpan.org