NAME
ODF::lpOD::Document - General ODF package handling and metadata
DESCRIPTION
This manual page describes the odf_document
, the common features of any odf_part
of a odf_document
, and the particular features of the odf_meta
part (that handles the global document metadata).
Every odf_document
is associated with a odf_container
that encapsulate all the physical access logic. On the other hand, every odf_document
is made of several components so-called parts. The lpOD API is mainly focused on parts that describe the global metadata, the text content, the layout and the structure of the document, and that are physically stored according to an XML schema. The common lpOD class for these parts is odf_xmlpart
(whose Perl implementation is the ODF::lpOD::XMLPart
package).
lpOD provides specialized classes for the simplest parts, such as odf_meta
which provides methods dedicated to get or set the document metadata (other odf_xmlpart
subclasses will come later in the final version).
The most complex odf_part
includes the content of the document; however, knowing that any piece of content belongs to an odf_element
, this complexity is handled through the odf_element
class (introduced in ODF::lpOD::Element) and its various subclasses.
Document & part initialization
Any access to a document requires a valid odf_document
instance, that may be created from an existing document or from scratch, using one of the constructors introduced below. Once created, this instance gives access to individual parts through get_xxx
methods, each one dedicated to a particular part.
odf_get_document(uri)
This function creates a read-write document instance. The returned object is associated to a physical existing ODF resource, which may be updated. The required argument is the URI of the resource.
Note: in the present implementation, the URI argument must be either a file path or a IO::Handle
corresponding to an open file or socket. The physical resource must be a well formed compressed ODF file, such as those natively produced by OpenOffice.org or compatible office software suites.
Example:
my $doc = odf_get_document("C:\MyDocuments\test.odt");
If the save
method of odf_document
is late used without explicit target, the document is wrote back to the same resource.
odf_new_document_from_template(uri)
Same as odf_get_document
, but the ODF resource is used in read only mode, i.e. it's used as a template in order to generate other ODF physical documents.
odf_new_document_from_type(doc_type)
Unlike other constructors, this one generates a odf_document
instance from scratch. Technically, it's a variant of odf_new_document_from_template
, but the default template (provided with the lpOD library) is used. The required argument specifies the document type, that must be 'text'
, 'spreadsheet'
, 'presentation'
, or 'drawing'
. The new document instance is not persistent; no file is created before an explicit use of the save
method.
The following example creates a spreadsheet document instance:
my $doc = odf_new_document_from_type('spreadsheet').
The real content of the instance depends on the default template.
A set of valid template ODF files (created using OpenOffice.org) is transparently installed with the standard lpOD distribution. Advanced users may use their own template files. To do so, they have to replace the ODF files present in the templates
subdirectory of the lpOD installation; the path to the lpOD installation may be retrieved through the lpod-
installation_path> common function. The user-provided template files must have the same names.
get_mimetype
Returns the MIME type of the document (i.e. the full string that identifies the document type). An example of regular ODF MIME type is:
application/vnd.oasis.opendocument.text
set_mimetype(new_mimetype)
Allows the user to force a new arbitrary MIME type (not to use in ordinary lpOD applications !).
get_part(name)
Generic odf_document
method allowing access to any part of a previously created document intance, including parts that are not handled by lpOD. The lpOD library provides symbolic constants that represent the most usual part names: CONTENT
, STYLES
, META
, MANIFEST
, SETTINGS
, and MIMETYPE
.
The use of this low level method is not encouraged unless the user wants to work closely with the physical ODF file.
get_content
Returns a handler to the document part that contains the text and structure information, and possibly some automatic style definitions. This is generally the most important, the most used and the most complex part. The structure of the content
part depends on the document type. For example, the content of a spreadsheet document is a sequence of tables, while the content of a presentation document is a sequence of draw pages. With text documents, the content may be made of a huge variety of elements.
my $workspace = $doc->get_content;
The returned part handler may be used either to execute part-related operations or to get specific elements in order to use them for context-based operations.
The most part of the content-oriented lpOD features are provided through element-based methods (see ODF::lpOD::Element).
get_manifest
Returns a handler for the ODF document manifest (i.e. the XML catalog of all the parts included in the document instance). There is no specific feature in the present development version for this handler, which may be used and/or updated using the generic part-based and element-based methods.
get_meta
Returns a handler for the global document metadata. See the Global document metadata section for details about its features.
get_settings
Returns a handler for the ODF document settings. There is no specific feature in the present development version for this handler, which may be used and/or updated using the generic part-based and element-based methods.
get_styles
Returns a handler for the ODF document styles part. There is no specific feature in the present development version for this handler, which may be used and/or updated using the generic part-based and element-based methods.
Accessing data inside a part
Everything in the part is stored as a set of odf_element
instances. So, for complex parts (such as content
) or parts that are not explictly covered in the present documentation, the applications need to get access to an "entry point" that is a particular element. The most used entry points are the root
and the body
. Every part handler provides a get_root
and a get_body
methods, each one returning a odf_element
instance, that provides all the element-based features (including the creation, insertion or retrieval of other elements that may become in turn working contexts).
For those who know the ODF XML schema, two part-based methods allows the selection of elements according to XPath expressions, namely get_element
and get_element_list
. The first one requires an XPath expression and a positional number; it returns the element corresponding to the given position in the result set of the XPath expression (if any). The second one returns the full result set (i.e. a list of odf_element
instances). For example, the instructions below returns respectively the first paragraph and all the paragraphs of a part (assuming $part
is a previously selected document part):
my $paragraph = $part->get_element('text:p', 0);
my @paragraphs = $part->get_element_list('text:p');
Note that the position argument of get_element
is zero-based, and that it may be a negative value (if so, it specifies a position counted backward from the last matching element, -1 being the position of the last one).
So a large part of the lpOD functionality is described with ODF::lpOD::Element.
Global document metadate
From the handler provided by the get_meta
document method, several metadata of the document may be directly get or set.
Simple metadata accessors
Most metadata are just text strings. The user may read or write each one using a get_xxx
or set_xxx
accessor, where "xxx" is the lpOD name of a particular property. The presently supported simple properties are:
creation_date
: the date of the initial version of the document, expressed in ISO-8601 date formatcreator
: the name of the user who created the current version of the documentdescription
: the long description of the documentediting_cycles
: the number of edit sessions (may be regarded as a version number)editing_duration
: the total editing time through interactive software, expressed as a time delta in ISO-8601 formatgenerator
: the signature of the application that created the documentinitial_creator
: the name of the user who created the first version of the documentlanguage
: the ISO code of the main language used in the documentmodification_date
: the date of the last modification (i.e. ot the current version)subject
: the subject (or short description) of the documenttitle
: the title of the document.
Both set_creation_date
and set_modification_date
allow the user to provide the date in the ODF-compliant (ISO-8601) format, or in numeric format (like the Perl time
format). In the second case, the provided time is automatically converted in the required format. The corresponding get_
accessors always return the dates in their storage format. However, the lpOD library provides a numeric_date
that translates a regular ISO date into a Perl numeric time
value (a symmetric iso_date
global function translates a Perl time
into a ISO date).
Examples of use:
$meta->set_title("The lpOD Cookbook");
$meta->set_creator("The lpOD Project team");
$meta->set_modification_date(time);
my $old_version = $meta->get_editing_cycles;
$meta->set_editing_cycles($old_version + 1);
Document statistics
The global document statistics (as defined in the §3.1.18 of the ODF 1.1 specification) may be get or set using the get_statistics
and set_statistics
accessors. The first one returns the statistic properties as a hash reference. The second one takes a hash reference with the same structure, containing the attribute names and values. The following example displays the page count of the document (assuming it's a text document):
my $meta = $document->get_meta;
my $stat = $meta->get_statistics;
say $meta->{'meta:page-count'};
Note that nothing prevents the applications from using get_statistics
to set any arbitrary figures.
Keywords
The document metadata include a list of keywords (possibly empty). This list may be used or changed.
get_keywords
Knowing that a document may be "tagged" by one or more keywords, odf_meta
provides a get_keywords
method that returns the list of the current keywords as a comma-separated string.
set_keywords(string_of_keywords)
set_keywords
allows the user to set a full list of keywords, provided as a single comma-separated string; the provided list replaces any previously existing keyword; this method, used without argument or with an empty string, just removes all the keywords. Example:
$meta->set_keywords("ODF, OpenDocument, Python, Perl, Ruby, XML")
The spaces after the commas are ignored, and it's not possible to set a keyword that contains comma(s) through set_keywords
.
set_keyword(keyword)
set_keyword
appends a new, given keyword to the list; it's neutral if the given keyword is already present; it allows commas in the given keyword (but we don't recommend such a practice).
check_keyword(keyword)
check_keyword
returns TRUE
if its argument (which may be a regular expression) matches an existing keyword, or FALSE
if the keyword is not present.
remove_keyword(expression)
remove_keyword
deletes any keyword that matches the argument (which may be a regular expression).
User-defined metadata
Each user-defined metadata element has a unique name (or key), a value and a datatype.
get_user_field(name)
Retrieves a user-defined field according to its name (that should be unique for the document). In scalar context, returns the value of the field. In array context, returns the value and the data type.
The regular ODF datatypes are float
, date
, time
, boolean
, and string
.
get_user_fields
The odf_meta
API provides a get_user_fields
method that returns a list whose each element is a hash ref whose (self-documented) keys are name
, value
, and type
.
As an example, the following loop displays the name, the value and the type of each use field in the matadata part of a document:
my $doc = odf_get_document($source);
my $meta = $doc->get_meta;
foreach my $uf ($meta->get_user_fields) {
say "Name " . $uf->{name} .
"Value " . $uf->{value} .
"Type " . $uf->{type}
}
set_user_field(name, value, type)
Creates or changes a user field. The first argument is the name (identifier). The last argument is the data type, which must be ODF-compliant (see get_user_field
). If the type is not specified, it's default value is 'string'
. If the type is date
, the value is automatically converted in ISO-8601 format if provided as a numeric time
value.
Examples:
$meta->set_user_field("Development status", "Working draft");
$meta->set_user_field("Security status", "Classified");
$meta->set_user_field("Ready for release", FALSE, "boolean");
How to commit the document updates
Every part may be updated using specific methods that creates, change or remove elements, but this methods don't produce any persistent effect.
The updates done in a given part may be either exported as an XML string, or returned to the odf_document
instance from which the part depends. With the first option, the user is responsible of the management of the exported XML (that can't be used as is through a typical office application), and the original document is not persistently changed. The second option instructs the odf_document
that the part has been changed and that this change should be reflected as soon as the physical resource is wrote back. However, a part-based method can't directly update the resource. The changes may be made persistent through a odf_document
method.
serialize
This part-based method returns a full XML export of the part. The returned XML string may be stored somewhere and used later in order to create or replace a part in another document, or to feed another application.
A pretty
named option may be provided. If set to TRUE
, this option specifies that the XML export should be as human-readable as possible.
store
This part-based method stores the present state (possibly changed) of the part in a temporary, non-persistent space, waiting for the execution of the next call of the document-based save
method.
Like serialize
, store
allows the pretty
option.
save
This method is provided by the odf_document
. If the document instance is associated with a regular ODF resource available for update (meaning that it has been created using odf_get_container
and that the user has a write access to the resource), the resource is wrote back and reflecting all the changes previously committed by one or more document parts using their respective store
methods.
As an example, the sequence below updates a ODF file according to changes made in the meta
and content
parts:
my $doc = odf_get_document("/home/users/jmg/report.odt");
my $meta = $doc->get_meta;
my $content = $doc->get_content;
# meta updates are made here
$meta->store;
# content updates are made here
$content->store;
$document->save;
An optional target
parameter may be provided to save
. If set, this parameter specifies an alternative destination for the file (it produces the same effect as the "File/Save As" feature of a typical office software). The target
option is always allowed, but it's mandatory with odf_document
instances created using odf_new_document_from...
constructor.
COPYRIGHT & LICENSE
Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend.
This work was sponsored by the Agence Nationale de la Recherche (http://www.agence-nationale-recherche.fr).
lpOD is free software; you can redistribute it and/or modify it under the terms of either:
a) the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. lpOD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with lpOD. If not, see http://www.gnu.org/licenses/.
b) the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 259:
Non-ASCII character seen before =encoding in '§3.1.18'. Assuming UTF-8