NAME
ODF::lpOD - An OpenDocument management interface
SYNOPSIS
use ODF::lpOD:
my $document = odf_get_document("report.odt");
my $meta = $document->get_part(META);
$meta->set_title("The best document format");
my $content = $document->get_part(CONTENT);
my $context = $content->get_body;
my $paragraph = $context->get_paragraph(
content => "I look for it"
);
$paragraph->set_text("I found it");
$paragraph->set_style("Standout");
my $new_paragraph = odf_create_paragraph (
style => "Standard",
text => "A new content"
);
$context->append_element($new_paragraph);
my $table = odf_create_table (
"Main Figures", height => 20, width => 16
);
$context->insert_element($table, before => $paragraph);
my $cell = $table->get_cell("B4");
$cell->set_text("Here B4");
$document->save;
exit;
The code example above loads a document from an existing "report.odt" file, updates various data in the document, then saves the changes. The following actions are done in the document:
1) The title is set to "The best document format";
2) The first paragraph containing "I look for it" is retrieved (this paragraph is supposed to exist; otherwise get_paragraph would return undef);
3) The content of the found paragraph is replaced by "I found it", and its style is set to "Standout" (this style is supposed to exist or to be defined later);
4) A new paragraph, whose text is "A new content" and style is "Standard", is created then appended to the document body;
5) A new table whose name is "Main Figures" and size is 20x16 is created then inserted just before the first retrieved paragraph;
6) The "B4" cell (i.e. the cell belonging to the 4th row and the 2nd column, whatever the document type) is retrieved, and its content is set to "Here B4" (the cell data type is automatically set to 'string'
).
DESCRIPTION
This module is an Open Document management interface. It allows the users to create or transform office documents, or to extract data from them. It can handle documents which comply with the Open Document Format international standard (ODF). It may handle text documents (ODT), spreadsheet documents (ODS), as well as presentation (ODP) or drawing documents (ODG).
ABOUT lpOD
This is the Perl implementation of the lpOD project.
lpOD is a Free Software project that offers, for high level use cases, an application programming interface dedicated to document processing with the Python, Perl and Ruby languages. It's complying with the OASIS Open Document Format (ODF), i.e. the ISO/IEC 26300 international standard.
lpOD is designed according to a top-down approach. The API is bound to the document functional structure and the user's point of view. As a consequence, it may be used without full knowledge of the ODF specification, and allows the application developer to be focused on the business needs instead of the low level storage concerns.
The lpOD API is object oriented.
Basic document access principles
The general access to the documents uses the odf_document
class. Before processing a document, an odf_document instance must be created using one of the allowed constructors. While an odf_document object encapsulates the physical resource access logic, the real data must be handled through document parts, knowing that each part represents a specialized aspect of the document.
Each part contains a set of odf_element
objects, knowing that odf_element is the common base class for any kind of document simple or complex element (an odf_element may be a visible object, such as a paragraph or a table, as well as a piece of data that specifies the layout or the behavior of other objects, such as a text style or a page layout). Each part contains a root element, that is a special odf_element containing all the elements of the part. A part may contain a body element, that is a more restricted but in some cases more interesting context than the root.
lpOD is a read-write API. However, the changes made by the applications aren't automatically persistent. The API provides methods that insert, delete, or update elements in memory. In order to make the change persistent, explicit odf_part
and odf_document
methods must be used.
Global document initialization
A few specialized constructors may be used in order to create odf_document objects. All these constructors return an odf_document object in case of success, a FALSE value otherwise.
One an odf_document is created, it's content may be wrote back to a persistent storage using its save
method.
odf_get_document(source)
Instantiates an odf_document
object which is a read-write interface to an existing ODF package corresponding to the given source. The package should be an ODF-compliant zip file (odt, ods, odp, and so on). Example:
my $document = odf_get_document("C:\Path\Doc.odt");
The source argument must be provided either as a regular file path or as a IO::File
object.
odf_new_document(document_type)
Returns a new odf_document corresponding to the given ODF document type. Allowed document types are presently 'text'
, 'spreadsheet'
, 'presentation'
, and 'drawing'
). Example:
my $document = odf_new_document('spreadsheet');
Technically, the new document is generated as a clone of an existing template document, provided with the lpOD distribution. It operates in the same way as odf_new_document_from_template
, but the user doesn't need to provide the template document.
odf_new_document_from_template(source)
Returns a new odf_document instantiated from an existing ODF template package. Same as odf_get_document
, but the source package is read-only.
save([destination])
This function is a method. It must be called from an odf_document instance.
Without argument, it attempts to write it's content back to the resource that was used to create it. A warning is issued and nothing is done if the document has been created without source file or from a read-only template (i.e. through odf_new_document
or odf_new_document_from_template
).
This method produces a file whose basic format is the same as the format of the source document or template (whatever the target file name, if any).
If the optional parameter target
is provided, it's regarded as the storage destination. Its value may be a regular file path or a IO::File
. This parameter is mandatory if the odf_document
instance has been created through odf_new_document_from_template
or odf_new_document_from_type
.
Example:
$document->save(target => "/myfiles/target.odt");
Document part initialization and handling
A regular ODF document contains various parts, some of them mandatory. The interesting parts in the lpOD scope are 'content'
, 'styles'
, 'meta'
, 'settings'
, and 'manifest'
.
The odf_document class provides a get_part()
method, that must be used with an argument that specifies the needed part. Example:
my $content = $document->get_part(CONTENT);
my $meta = $document->get_part(META);
The sequence above gives access to the content and meta parts of a previously created odf_document
instance.
Beware: if get_part()
is called twice or more from the same odf_document
instance and with the same part designation, it returns the same object. As a consequence, after the sequence below, $p1
and $p2
will be synonyms:
my $p1 = $document->get_part(CONTENT);
my $p2 = $document->get_part(CONTENT);
serialize()
returns an XML export of the whole part (the application is then responsible of the fate of this export). An optional pretty
argument, if set to TRUE, specifies that the XML output must be human-readable. Example:
my $content = $document->get_part(CONTENT);
# here some content processing
my $xml = $content->serialize(pretty => TRUE);
Basic ODF element handling
Every odf_part
objects provides a low level get_element
method whose first argument is an XPath expression and the second one a numeric position. The numeric argument specifies the order number of the required element among the set of elements matching the XPath. If the order number is negative, the position is regarded as counted backward from the end. The position is zero- based (i.e. a zero value means the first matching element). As an example, the instruction below returns the last paragraph of the document.
my $document = odf_get_document($source);
my $content = $document->get_part(CONTENT);
my $p = $part->get_element("//text:p", -1);
However, this way is not the smartest one because it requires the knowledge of the ODF schema (and some XPath skills for more complicated cases).
lpOD provides more user-friendly, XPath-free methods for the most used elements in the CONTENT
part of a document. These methods are provided through the odf_element
class. Any individual element in a part is an odf_element
object. There is a shortcut to get the top (or root) element of any part: the get_root()
method. Once selected, the top element provides all the context methods of the lpOD API.
A context method is a method owned by an element (the context) and whose effect is related to the children and descendants of this element. So, the get_xxx
method of a given element is a retrieval method intended to select something below the current element. Thanks to the get_paragraph
element provided by the odf_element
class, the last example could be wrote as shown below:
my $document = odf_get_document($source);
my $context = $document->get_part(CONTENT)->get_root;
my $p = $context->get_paragraph(-1);
In most cases (including the previous example), get_root
may be replaced by get_body
, that return a context containing all the visible elements (including the paragraphs).
There is a generic context-based get_element
that differs from the part-based one. It allows the user to select an element according to its text content, one of its attributes, and/or its sequential position in the context. As an example, the sequence below displays the name of the last page that uses the draw page style "dp1" (assuming we are using a presentation or drawing document):
my $context = $document->get_part(CONTENT)->get_body;
my $page = $context->get_element(
'draw:page',
attribute => 'style name',
value => 'dp1',
position => -1
);
say $page->get_attribute('name');
lpOD provides special name-based retrieval methods for some elements that own unique names. For example the instruction below selects the table whose name is "T1" (if any):
$table = $context->get_table_by_name("T1");
The meta
document part, unlike others such as the content
one, provides direct get
and set
accessors for the content of the usual metadata, so there is no need of a context element, as shown below in the following example that displays the title of a document:
my $document = odf_get_document($source);
my $meta = $document->get_part(META);
say $meta->get_title;
The title (like an other metadata value) may be updated or created with the corresponding set
accessor:
$meta->set_title("The new title");
All the properties of a previously selected element are stored in one or more attributes and in a text. So, for any odf_element
lpOD provides corresponding get
and set
accessors.
get_text
returns the current text, while set_text
replaces the current content by a new text (possibly empty). Without argument, get_text
returns the text directly contained in the calling element, but with a recursive
optional named parameter set to TRUE
, it returns the concatenated texts of all the descendants of the calling element. On the other hand, set_text
deletes any previous content (i.e. direct text content and embedded elements such as bookmarks, variable fields, text segments with special styles, and so on).
The get_attribute
method requires the name of the needs attribute. This name may be the technical name according to the OpenDocument specification, or a more simple and significant name. For example, assuming $item
is a list item, and knowing that such an object may own a so-called text:restart-numbering
attribute telling that the list numbering must be restarted at this point from a given value, the following instruction sets this value to 6:
$item->set_attribute('restart numbering' => 6);
set_attribute
deletes an existing attribute as soon as the given value is undef
; so the instruction below cancels the restart numbering
feature:
$item->set_attribute('restart numbering' => undef);
Note that set_attribute
, provided with a non-null value, automatically creates the attribute if it doesn't exist; there is no need to separately check an attribute for existence and create it before setting a value.
It's possible to get or set more than one attributes in a single call using get_attributes
or set_attributes
. The first one returns the attributes as a hash reference (with the real ODF names), while the second one requires a hash reference as argument.
An element may be removed (with all its descendants) using its delete
method. (Beware: the deletion of a high level element may destroy a lot of content !). It's possible to delete the whole content of an element without removing the element itself by issuing a set_text
with an empty string.
The user is allowed to create a new element using the odf_create_element
constructor, that requires an appropriate ODF tag (corresponding to the type of element) or a valid XML string. Fortunately, lpOD provides a set of specialized constructors (such as odf_create_paragraph
, odf_create_table
, and so on) that may be used without knowledge of the XML stuff. Once created through such a constructor, the new element is not automatically included in a document. To do so, lpOD provides the insert_element
and append_element
methods, both context-based, i.e. called from an existing element that will become the parent of the new element. As an example, the sequence below creates a new paragraph (with given style and content), then appends it to a selected section:
my $document = odf_get_document($source);
my $context = $document->get_part(CONTENT)->get_body;
my $section = $context->get_section("Prologue");
my $paragraph = odf_create_paragraph(
style => "Standard", text => "The End of the Beginning"
);
$section->append_element($paragraph);
Elements may be created by replication of existing elements, thanks to the clone
method. The result of the instruction below is a copy of an existing section (with all its content); this copy is a "free" element (i.e. it's not included in any document, and it has no link with its prototype element), so it may be inserted elsewhere in the same document or in another document:
my $section = $context->get_section("Reusable");
my $free_section = $section->clone;
Getting started
The "Hello Word" example
Unsurprisingly, we propose you to test your lpOD installation and your knowledge of the big picture through this simple program:
use ODF::lpOD;
my $doc = odf_new_document('text');
my $content = $doc->get_part(CONTENT);
my $context = $content->get_body;
$context->append_element(
odf_create_paragraph(
style => "Standard",
text => "Hello World !"
)
);
$doc->save(target => "helloworld.odt");
exit;
If this script runs without warning, open the "helloworld.odt" file using your favorite ODF-compliant text processor, and look at the text content. You may then introduce more sophistication using the metadata part of the document. To do so, you can (for example) insert the lines below somewhere before the save
instruction (and after the odf_new_document
one).
my $meta = $doc->get_part(META);
$meta->set_title("Hello World Test");
$meta->set_creator("Me");
$meta->set_creation_date(iso_date);
$meta->set_modification_date(iso_date);
$meta->store;
After execution of the extended version, check the author's name and the creation & modification dates through the File/Properties dialog of your text editor.
Using the documentation
The ODF::lpOD::Tutorial is a recommended first reading that may help to quickly gain a basic understanding and get started with lpOD. The reference documentation is split into the following manual chapters:
ODF::lpOD::Document: General document packaging and metadata handling.
ODF::lpOD::Element: Common features, available with any element.
ODF::lpOD::TextElement: Text containers (paragraphs, headings), and various elements that may take place in paragraphs (bookmarks, index marks, bibliography marks, text variables and fields).
ODF::lpOD::Table: Access to tables and their content.
ODF::lpOD::StructuredContainer: High-level structures such as sections, lists, draw pages, shapes, image or text frames, tables of contents.
ODF::lpOD::Style: Style retrieval, update, or creation
ODF::lpOD::Common: Common utility functions
COPYRIGHT & LICENSE
Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend.
This work was sponsored by the Agence Nationale de la Recherche (http://www.agence-nationale-recherche.fr).
lpOD is free software; you can redistribute it and/or modify it under the terms of either:
a) the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. lpOD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with lpOD. If not, see http://www.gnu.org/licenses/.
b) the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0