NAME
pdf-collage - PDF manipulation with scissors and glue
VERSION
The version can be retrieved with option --version
:
$ pdf-collage --version
USAGE
pdf-collage [--help] [--man] [--usage] [--version]
pdf-collage [--data|--data-from|-d path [...]]
[--list|--list-selectors|-l]
[--output|-o output]
[--selector|-S string]
[--source|-s source [...]]
EXAMPLES
# expand a plain JSON files with some data, redirect output PDF
pdf-collage --source plain.json foo=bar baz=12 > test.pdf
# use a "proper" bundle instead, containing a single template inside
pdf-collage -s bundle.pdfc foo=bar baz=12 > test.pdf
# output can be controlled with --output|-o
pdf-collage -s bundle.pdfc -o test.pdf foo=bar baz=12
# input stuff can be JSON too
pdf-collage -s bundle.pdfc -o test.pdf '{"foo":"bar","baz":12}'
# there can be more
pdf-collage -s bundle.pdfc -o test.pdf '{"foo":"bar"}' '{"baz":12}'
# data can be loaded from files. "Free" arguments will win though
pdf-collage -s bundle.pdfc -o test.pdf -d data.json foo=bar
# the output filename might be expanded as a Template::Perlish thing
pdf-collage -s bundle.pdfc -o '[% name %].pdf' -d record.json
# if the source contains multiple templates, it's possible to list them
pdf-collage -s several.pdfc --list
# in this case a selector is needed
pdf-collage -s several.pdfc -S my-template foo=bar baz=12 > test.pdf
# complicated things are... doable, like handling the generation of
# multiple PDF files each with its own name generated on the fly,
# starting from a common base and picking specific customizations
pdf-collage -s bundle.pdfc -s other-bundle.pdfc \
-o '[% name %]-[% id %].pdf'
-d common-data.json foo=bar baz=galook \
'[{"name": "you", "id": 1}, {"name": "me", "id": 2}]'
DESCRIPTION
Generate PDFs much like the mail merge function that is common to at least two big office automation suites.
It proceeds from two input types: one or more sources of templates, and one or more records of data. They are merged to generate one PDF file for each record.
In case the source (or sources) contains multiple templates inside, it's possible to list them with command-line option --list|-l
, then use one of the selector strings that are printed with command-line option --selector|-S
.
It's possible to use Template::Perlish templates in many places, both inside the sources of data, both elsewhere, e.g. when selecting the right source to use or naming the output file.
Collage Sources Gathering
There are two main types of collage sources: plain templates or collections. While both are supported, chances are that a collection is a better choice for anything but simple one-off needs, as it allows packing together several different artifacts and provides a more portable solution.
Sources can be provided with the --source
command-line option or its alieases. They can represent JSON data, or directories, or file names:
if a source starts with optional spaces followed by a first non-space character that is either
[
or{
, then it is considered JSON data. In case of a file that is actually named like that, it's still possible to set the full or the relative path.otherwise, if it's a directory... it's a directory
otherwise, it must be a plain file. If the first non-space character within the initial 10 bytes is either
[
or{
then it is considered JSON data, otherwise it is considered a TAR archive.
A source representing JSON data is considered a single template and treated as such; see "Single template". If one is present, only that one can be present and anything else will be considered an error.
A source that is either a directory or a TAR archive is considered a collection of templates. It's possible to have several collections, which will be considered collectively, with the ones appearing first in the command line taking precedence over the following when looking for stuff inside of them while rendering PDF files.
Record(s) Data Collection
Collectiong data for doing the merge can be done in multiple ways.
On one side, every command-line argument that is not part of the available options is considered a source for such record's data, in one of two forms:
if the first non-space character is a
{
, then it's considered a JSON object, parsed as a hash and merged into a common hash of valuesif the first non-space character is a
[
, then it's considere a JSON array, parsed as an array and its elements added to a list of recordsotherwise, it's considered a key/value pair, separated by the first occurrence of a separator. Again, different alternatives are supported:
if
#=
or::
consider the value part on the right as being encoded with Base64, so it's decoded accordinglyotherwise, if the separator is
=
or one single:
, then the value is taken verbatim.
In both cases the key part is considered a trail of segments to navigate through the common data and set the value. As an example, a key
foo.bar.baz
would set$common{foo}{bar}{baz}
; the rules are the same as in Template::Perlish'straverse
.
It's also possible to feed data files with the --data
or its aliases. These files are always considered JSON files, with either objects (hashes) or arrays inside and handled like explained above. Files are always scanned first, so the respective values or records are handled before other data.
After the collection is complete, records are assembled. If no record was provided (i.e. no non-empty array appeared during collection), then the common data is considered a lone record and returned as such.
Otherwise, each collected record is merged with the common data and returned. This allows using the common data as providing defaults for all records, while still being able to set record-specific data or override some defaults.
Merging of hashes is performed onto a base one (the previously collected data, or the common data) based on the additional data (the new data or the specific record's data), with the following rules for handling each key/value pair:
If the very first character in the key is
-
, then the rest of the key is considered the real key and added onto the base only if it does not already contain a value. This allows for setting a late default.If the very first character in the key is
=
, then it's stripped from the key and the key/value pair is set in the base hash. This allows supporting keys that need to begin with a literal-
character, without incurring the behaviour of the previous bullet (e.g. target key-foo
would be provided as=-foo
).Otherwise, the key/value pair is added onto the hash.
There is no attempt at doing a deep merge of hash values, so only the top-level will be handled.
Templates Resolution
If a simple/single template is provided as JSON data, there's no resolution to be done and it's used directly.
Otherwise, if the source is a collection (or several collections), it makes sense to select one of the included templates.
First of all, it's possible to use option --list|-
to get a list of all available templates inside; the strings printed in standard output represent the selectors that can be later used to point out the specific template that is needed.
If there is only one selector, it's not necessary to pass it when invoking the program, as it will be used automatically. Otherwise, it is necessary to use command-line option --selector|-S
to pass the selector string.
Writing Templates
At the basic level, a template is a list of commands inside a properly-formatted JSON file.
Many times, though, these commands will refer to specific artifacts, like e.g. one or more input PDF files from where pages should be taken; this is where a templates collection source is better, as it allows to pack the JSON file with the commands together with all artifacts (including fonts, if needed) inside a directory or a TAR archive (for best portability).
Single template
The JSON template is a string (/file) containing the instructions for rendering a PDF file. It can have two forms: an object or an array.
In the former case, the object MUST contain a key commands
whose corresponding value is an array with the list of commands; in the latter, the array is directly the container for the list of commands.
The following commands are supported:
- add-image
-
{ "op": "add-image", "page": 1, "path": "/path/to/image.png", "x": 10, "y": 30, "width": 10, "height": 10 }
Add an image. See PDF::Build for the supported formats.
- add-page
-
{ "op": "add-page" }
Add an empty page at the end.
{ "op": "add-page", "page": 1 }
Add an empty page as page number 1.
{ "op": "add-page", "page": 2, "from-path": "/some/file.pdf", "from-page": 3 }
Get page 3 from file
/some/file.pdf
and add it as page 2 in the PDF that is built.Key
from-path
can also be abbreviated asfrom
. - add-text
-
{ "op": "add-text", "page": 1, "font": "DejaVuSans.ttf", "font-size": 12, "text": "whatever", "x": 10, "y": 20 }
Place a text label on the PDF.
The
font
key can be replaced withfont-family
.There are three ways of defining the text:
text
-
This text is taken verbatim and has precedence over other alternatives;
text-template
-
This text is expanded using Template::Perlish. It takes precedence over "text-variable".
text-variable
-
This is meant to be a variable that is expanded using Template::Perlish on the data provided.
- log
-
{ "op": "log", "level": "info", "message": "whatever!" }
Print a log message. If Log::Any is available, it will use it; otherwise,
warn
is used. - set-defaults
-
{ "op": "set-default", "font": "DejaVuSans.ttf", "font-size": 12, "level": "info" }
Set some defaults that will be used in following commands. This allows e.g. to set the same font once and for all for all "add-text" commands, or the font size.
Templates collection
A templates collection is a bundle that allows packing together multiple templates, as well as artifacts that can be referred from these templates.
In its basic form, it is a directory with the structure that is detailed below. This directory can also be packed as a TAR archive, that can be used as a collection too for maximum portability.
JSON templates MUST be files with extension .json
put inside a sub-directory named definitions
. Other artifacts can be placed in any place.
It's possible to refer to the artifacts bundled in the collection using function as_file()
that is injected in the Template::Perlish namespace and can thus be used in Template::Perlish templates. As an example, if the bundle includes a font file in location assets/fonts/shiny.ttf
, it's possible to use it in a add-text
command like this:
{
"op": "add-text",
"page": 1,
"font": "[%= as_file('assets/fonts/shiny.ttf') %]",
"'font-size": 12,
"text": "whatever",
"x": 10,
"y": 20
}
Similarly, for taking a page from a bundled PDF file in location assets/pdf/models.pdf
inside the directory:
{
"op": "add-page",
"page": 2,
"from-path": "[%= as_file('assets/pdf/models.pdf') %]",
"from-page": 3
}
OPTIONS
- --data|--data-from|-d path
-
load some data from the file at the specific path, assuming it's JSON.
JSON objects (hashes) contribute to a common set of data.
JSON arrays (of hashes) add records.
- --help
-
print out some help and exit.
- --list|--list-selectors|-l
-
print out the list of available selectors from provided sources.
- --man
-
show the manual page for pdf-collage.
- --output|-o output-spec
-
the output filename, defaulting to
-
which means standard output.It is treated as a template string expanded with each record's data.
- --selector|-S string
-
a selector string for templates with multiple definitions inside.
It is treated as a template string expanded with each record's data.
- --source|-s specification
-
a suitable input for taking instructions for building the PDF. It can be either a file holding JSON data, in which case it is treated as a simple template; otherwise it's considered a collection of templates bundled with artifacts, which usually implies that a
selector
will be needed (unless the bundle contains one single definition only). - --usage
-
show usage instructions.
- --version
-
show version.
BUGS AND LIMITATIONS
Please report any bugs or feature requests through the repository at https://codeberg.org/polettix/PDF-Collage.
AUTHOR
Flavio Poletti
LICENSE AND COPYRIGHT
Copyright 2023 by Flavio Poletti (flavio@polettix.it).
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
or look for file LICENSE
in this project's root directory.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.