=head1 NAME
OpenOffice::OODoc::XPath - Low-level XML navigation in the documents
=head1 DESCRIPTION
This module is a low-level class which uses OODoc::File (without
inheriting anything from it) along with the classes defined in the
XML::Twig module. It's a common basis for the other, more user-
friendly, document-oriented modules.
The OpenOffice::OODoc::XPath class should not be explicitly used in the
applications, because all its features are available in more user-oriented
classes such as OODoc::Text, OODoc::Styles, OODoc::Image, OODoc::Document
and OODoc::Meta. The present manual page is provided to describe the
common methods and properties that are available with all these classes.
This chapter can be skipped by programmers who are only interested
in document types handled by the specialist classes which follow.
Understanding these classes is easier and using them requires less
Perl and XML expertise. However, calling OODoc::XPath remains a good
rescue option as it allows all kinds of operations on all types of
XML members contained in any OpenOffice.org document.
This class forms the basis of OODoc::Meta, OODoc::Text,
OODoc::Styles and OODoc::Image. It contains the lowest layer of
navigation services for XML documents and handles the link with
OODoc::File for file access. Its primary role is as an interface
with the XML::Twig API.
OODoc::XPath is based on the XML::Twig module (see CPAN
documentation). In the following chapters, you will see elements
often mentioned. When it says that a module expects a parameter or
returns an element (either singly or as a list), it is referring to
an XML element. More precisely, it is referring to an object of the
XML::Twig::Elt class (unless otherwise stated) and all available
methods this object confers. Generally speaking, it is not
necessary to call these low-level methods contained within
OODoc::XPath and its descendants using their simpler form. It is
however important to distinguish elements from their content
(elements being simply references to XML data structures). To read
or modify the content of an element such as its text or XML
attributes, use the accessors also available within OODoc::XPath.
In most cases where XPath methods require a reference to an element
as an argument, there are two ways of proceeding:
- reference the element directly (obtained previously)
- or give the XML::XPath path and position, being a string and
an integer respectively [2]
Some methods accept both forms which means that if the first
parameter is recognised as an element reference, the position does
not need to be given. Therefore the number of arguments for certain
OODoc::XPath methods can vary.
For those who really want to access all areas there are also
OODoc::XPath methods which allow unrestricted access to every
element or XML attribute via an access path in XPath syntax. If you
are into this kind of thing, we recommend you obtain good syntax
reference manuals for XPath and OpenOffice.org and a supply of
aspirin.
Methods which may return several lines of text (e.g. getTextList) do
so either in the form of an unique character string containing "\n"
separators or in table form.
Unless otherwise stated, the word document in this chapter only
refers to XML documents contained within OODoc::XPath objects and
not OpenOffice.org documents (as an end user would use).
Amongst the different methods which return elements, attributes or
text, some are called getXxx, others selectXxx or findXxx. Read
methods whose names start with "get" generally refer to an
unfiltered object or list, whereas others return an object or list
filtered according to a parameter value. In this latter case the
search parameter is treated as a standard expression and not an
exact value. This means that if the search criteria is "xyz", all
text containing "xyz" will be considered a match. To restrict the
search to text exactly equal to "xyz", use "^xyz$" as the search
criteria (following Perl regular expression syntax).
Several methods allow you to place copies of or references to
elements (from other documents or from other positions in the same
document) in any position in the current document. This offers
powerful manoeuvrability but only if these placements conform with
the destination position's context [3] .
For advanced users familiar with the XML::Twig API, it might be
interesting to know that all the objects called "elements" in the
following chapters are objects of the XML::Twig::Elt
class, and that all methods associated with this class are directly
applicable to them, on top of the functionality described in this
manual. However, this should not normally be needed.
Important note: We recommend using OODoc::Meta and OODoc::Document
(which are both OODoc::XPath derivatives) to manipulate metadata
(for all document types) and content (for text documents)
respectively. These two objects provide highest-level methods which
are neater and more productive. Explicit use of XPath methods (which
sometimes require large numbers of parameters) should only be
considered as a last resort in unexpected circumstances for access
to any element or XML attribute not handled by more "friendly"
methods.
=head2 Methods
=head3 Constructor : OpenOffice::OODoc::XPath->new(<parameters>); [4]
Short Form: ooXPath(<parameters>)
Returns a new instance of XPath, containing a well-formed XML
document given directly or indirectly as a parameter.
Example:
my $doc = ooXpath
(
file => "myfile.sxc",
member => "content"
);
# ... lot of processing ...
$doc->save;
Returns a new document object. In the example above, the object
is loaded from a regular OOo file, that is the most current
option, but there are other possibilities. It's possible to use
flat XML (available as a string in memory, or loaded from a file).
In addition, this constructor is able to create a new document
from scratch.
Parameters are named (hash key => value). The constructor must get
at least one parameter giving a means of obtaining the XML document
that it will represent. Three options are available:
my $doc = OpenOffice::OODoc::XPath->new(xml => $xml_string);
my $doc = OpenOffice::OODoc::XPath->new
(archive => $oofile, member => 'meta');
my $doc = OpenOffice::OODoc::XPath->new
(file => 'source.sxw', member => 'content');
(Remember you can replace "OpenOffice::OODDoc::XPath" by "ooXPath"
in the instructions above, provided that you have loaded the
main OpenOffice::OODoc module, that defines this shortcut, and not
only and explicitly OpenOffice::OODoc::XPath.)
The first method returns an XML string directly (obtained or created
previously by the program).
The second method links OODoc::XPath to an existing OODoc::File
object (so-called "archive" because it's a zip archive used through
an object-oriented API) and indicates which XML member it is to
extract (metadata, content, styles, etc). The OODoc::File is an
abstraction of an already open OOo file.
The third method is the easiest, because the user just provide
a filename and a member, and all the file interface is run silently
(i.e. an invisible OODoc::File object is automtically created and
used to get the content). It's probably the most used approach; its
recommended when the user doesn't need to get more than one member
in the same file.
The 'member' option is a selector that tells what component is
needed (the content of the document, the styles, the metadata, ...)
knowing that an OODoc::XPath object can handle only one component.
If the application needs to process, say, the content and the styles
in the same session, it must create two, or more, OODoc::XPath objects
possibly associated with the same OOo file.
The explicit instantiation of an OODoc::File object is required
when the user needs a read/write access to two or more components
in the same OOo file. For example, a program which must access a
spreadsheet's content and page layout simultaneously could do it
like this:
my $archive = ooFile("invoice.sxc");
my $content = ooXPath(archive => $archive, member => 'content');
my $styles = ooXPath(archive => $archive, member => 'styles');
Caution: being associated with an archive via OODoc::File, none of
these OODoc::XPath objects should be deleted before the final save
call for this archive. So by calling a save, the File object "calls
up" all the XPath objects which were "connected" to it in order to
"ask" each of them for the changes which were made to the XML
(content, styles, meta, etc.). The results are unpredictable if any
of them is absent when called. In short, an application should never
delete (undef) OODoc::XPath objects; their number should be kept to
an absolute minimum and their lifespan should be the same as that of
the program itself.
If the provided filename has a ".xml" or ".XML" suffix, or whatever
the name if the 'flat_xml' option is set to 1, the file is processed
as flat XML and not as a regular OOo file. No OODoc::File object is
created, so the save() method is not available. To save the changes
made in the document, the application can either export the document
as flat XML to a text file (see getXMLContent() below) or select some
elements of the document to inert them into another OODoc::XPath
object.
You can pass the optional parameter 'element' in any case where the
constructor is called without the 'xml' parameter. Bearing in mind
that an OODoc::XPath object will not necessarily handle an entire
XML document, this extra parameter indicates the name of the XML
element to be loaded and handled. If the 'element' parameter is not
given for an OpenOffice.org document, a default element will be
chosen according to the following table:
'meta' => 'office:document-meta'
'content' => 'office:document-content'
'styles' => 'office:document-styles'
'settings' => 'office:document-settings'
'manifest' => 'manifest:manifest'
Conversely, the 'element' parameter becomes mandatory if the chosen
XML element is not listed above. Through OODoc::File, OODoc::XPath
can actually access archives which are not necessarily in
OpenOffice.org format and may be, for example, "banks" of
presentation and content templates.
If the application needs to create a new document, and not process
an existing one, an additional option must be passed:
create => "class"
where "class" must be one of the following list: "text",
"spreadsheet", "presentation" or "drawing", according to the needed
content class. And, for very special needs, the user can pass an
additional "template_path" to select an ad hoc directory of XML
templates instead of the default one. This user-provided directory
must have the same kind of structure and content as the "templates"
subdirectory of the OpenOffice::OODoc installation.
An additional 'opendocument' option, set to '1' (or 'true') should be
provided if the new document must comply with the OASIS Open Document
specification, knowing that the default format is OpenOffice.org v1.
Be careful: the 'opendocument' option should not be set against
previously existing documents.
OODoc::XPath can process OOo documents provided through XML flat
files as well as in the compressed (zip) format. The given file is
automatically processed as flat XML if either it's name ends by ".xml"
or the 'flat_xml' option is set to '1'. When processing a flat XML
file, OODoc::XPath doesn't load the OODoc::File zip interface. So,
a subsequent call of the save() method can only export the document
as flat XML.
An optional 'readable_XML' can be passed. If this option is provided
and set to 'on', the resulting XML will be smartly indented (and, of
course, more space-consuming). This feature is intended for debugging
purposes and should not be used in production.
The 'local_encoding' option can be set with the appropriate value
when a particular character set (and not the default one) must be
used for a document.
Other optional parameters can also be passed to the constructor (see
Properties below).
=head3 appendElement(path, position, name/xml, [options]);
=head3 appendElement(element, name/xml, [options]);
Adds a new element or existing element to the list of child elements
of an existing parent element given first (by [path, position] or by
reference).
The argument after the position argument can be an XML element name.
Example:
$content->appendElement
(
'//office:body', 0, 'text:p',
text => "New text"
);
adds a paragraph containing the phrase "New text" to the end of the
body of the document [6] .
If the 'text' option is omitted, an empty element is created (in the
above example it would be an empty paragraph or line feed).
You can pass the 'attribute' option which is really a hash whose
keys are the XML attribute names and whose values are the XML
attribute values. Use of these options depends on the type of
document and the type of element and requires knowledge of
OpenOffice.org conventions.
Example:
$my_style =
{
'style:name' => 'P1',
'style:family => 'paragraph'
};
$content->appendElement
(
'//office:automatic-styles', 0, 'style:style',
attribute => $my_style
);
creates a new paragraph style called 'P1' in the list of "automatic
styles" [7] .
This method lets you add any kind of element into a document, even
exotic ones. With the most common OpenOffice.org objects (e.g.
paragraphs), though, it is easier to use the specialist methods
contained in other modules.
The 'name' argument can be replaced by an existing element in the
same OODoc::XPath object or in another. In which case no element is
created but the existing element is simply referenced with a new
position even though it remains in its old position. Caution: any
modification of an element which is referenced several times in one
or more documents is made to all references. If you want to add a
similar but separate element, you must use replicateElement which
produces a new element from the content of an existing one.
The 'name' argument can also be replaced by an XML string. This
string must correspond to the correct XML description of a UTF-8
encoded [8] OpenOffice.org element. For example, it could be a
string which had been previously exported using the exportXMLElement
method of OODoc::XPath, or extracted from an OpenOffice.org file by
some other application [9] .
The following piece of code produces the same result as the first
example:
$xml = '<text:p text:style-name="Standard">' .
'New text' .
'</text:p>';
$content->appendElement
(
'//office:body', 0, $xml
);
Using this method, after one or more element creations by direct
importation of XML strings, it might be useful to call the
reorganize method (but not absolutely necessary).
=head3 cloneContent(oodoc_xpath_object)
Cancels the entire document contents of the current instance and
replaces it with a reference to the contents of another OODoc::XPath
object.
Example:
$doc1 = OpenOffice::OODoc::XPath->new
(
file => 'template.sxc',
member => 'styles'
);
$doc2 = OpenOffice::OODoc::XPath->new
(
file => 'sheet.sxc',
member => 'styles'
);
$doc2->cloneContent($doc1);
$doc2->save;
This sequence replaces the styles and page layout of 'sheet.sxc'
with those of 'template.sxc'.
The above example could easily have been written without even using
OODoc::XPath by acting directly on the files. For example, extract
the 'styles.xml' member from 'template.sxc' and insert it into
'sheet.sxc'. The use of OODoc::XPath and the cloneContent method
guarantees that the transferred content corresponds to an
OpenOffice.org document and allows reads/writes to it on the fly.
Caution: the "cloned" content is not physically copied. Calling this
method references one single physical content in two documents. Any
modifications made to the content of either of these two documents
applies equally to the other and vice-versa.
=head3 contentClass([class name])
Accessor to get or set the class of the document content. If the
current member is a document content, returns its class according
to the OpenOffice.org terminology, i.e. one of the following values:
"text", "spreadsheet", "presentation", or "drawing".
Returns an empty string if the current member is not a document
content (if it's, for example, the "meta" or "styles" member).
This accessor is read-only.
=head3 createElement(name, text)
=head3 createElement(xml)
Creates a new element without attributes which is not inserted in a
document.
Example:
my $element =
$doc->createElement
('my_element', 'its content');
creates a new XML element without attributes and returns its
reference.
Instead of a name, the first argument can be the full XML
description of the element. Example:
my $element = $doc->createElement
('<text:p>My text</text:p>');
This new element is temporary: it is not linked to any document. It
is destined to be used later by another method.
The name can contain a namespace prefix which would look like this:
'namespace:name'.
In its second form, a well-formed XML string can be supplied as a
single argument. The recognition criteria is the presence of the "<"
character at the beginning of the argument. See appendElement for
comments on the direct insertion of XML.
Explicit calls to createElement should be rare. This method is
normally called silently by higher-level methods which are capable
of creating an element, inserting it in a document's XML tree and
giving it attributes (see appendElement and insertElement).
=head3 decode_text(utf8_string)
Caution: this method is a non-exported class method. It must be used
like this:
OpenOffice::OODoc::XPath::decode_text($utf8_string);
and not from an OODoc::XPath instance.
Decodes a UTF8 [10] string and returns an 8 bit character [11]
translation of it out of the user's character set, as defined by the
following variable:
$OpenOffice::OODoc::XPath::LOCAL_CHARSET
for which the default value is 'ISO-8859-1' [12] .
Explicit calls to this method should be rare. It is used internally
by methods which return text extracted from document content (e.g.
getText).
Warning to contributors: any method which returns text extracted
from OpenOffice.org documents is based on decode_text; so any
modification or improvement of the decoding logic should be made
there.
=head3 encode_text(editable_string)
Class method.
Encodes "local" character strings (for writing to OpenOffice.org
documents).
Example:
$string = OpenOffice::OODoc::encode_text($local_string);
The local character string is defined by the following global
variable:
$OpenOffice::OODoc::XPath::LOCAL_CHARSET
for which the default value is 'ISO-8859-1'.
Explicit calls to this method should generally be avoided. It is
used internally by methods which insert text or attribute values
into documents (e.g. setText).
=head3 dispose()
Deletes the calling document object. Recommended as soon as the
object is no longer needed by the application, and sometimes
mandatory to avoid memory leaks.
=head3 exportXMLBody()
Returns the XML string for use by another application representing
the body of a document, without UTF8 decoding.
=head3 exportXMLContent()
See getXMLContent()
=head3 exportXMLElement(path, position)
=head3 exportXMLElement(element)
Returns the XML string which represents a particular document
element (style definition, paragraph, table cell, object, etc.) for
use by another application without UTF8 decoding.
This method is principally designed to allow remote exchanges of
elements between programs using any XML storage or transfer method.
It acts as "sender" whilst the "receiver" can use appendElement or
insertElement (for example) to insert any exported elements into a
document. Example:
# sender programme
# ...
open (EXPORT, "> transfer.xml");
print EXPORT $doc->exportXMLElement('//text:p', 15);
close EXPORT;
# receiver programme
# ...
open (IMPORT, "< transfer.xml");
$doc->appendElement('//office:body', 0, <IMPORT>);
close (IMPORT);
In this example, a paragraph is transferred but it could just as
easily be any content, presentation or metadata element.
Conversely, this method is not needed when transferring an element
from one document to another in the same program (or from one
document position to another). An element can be copied directly
from within the same program by reference or replication without
going via its XML (see appendElement, insertElement and
replicateElement).
=head3 extendText(path, position, text)
=head3 extendText(element, text)
Appends the given text to the previous content of the given
element.
Example:
$doc->setText($p, "Initial content");
$doc->extendText($p, " extended");
Assuming $p is a regular text element (ex: a paragraph), its
content becomes "Initial content extended".
(See also setText()).
=head3 findElementList(element, filter [, replacement])
Returns a list of child elements (from a document's tree) of the
element given as an argument whose content agrees with the 'filter'
parameter. The filter can be an exact string match or a regular
expression. If the filter is omitted or contains a wildcard
expression like '.*', the returned list will contain all child
elements without condition.
If the third argument ('replacement') is given, every string which
matches the filter in each child element will be replaced by this
'replacement' value. This 'replacement' argument can be a character
string or a function reference. (See replaceText method below.)
Filtering and possible replacement only affects an element's content
and not its attributes [13] .
This method is mostly for internal use. We recommend using other
methods for the selective extraction of elements.
=head3 getAttribute(path, position, name)
=head3 getAttribute(element, name)
Returns the 'name' value of the chosen element (or undef if name is
not defined or if the element does not exist).
Example:
my $style =
$doc->getAttribute('//text:p', 15, 'text:style-name');
returns the style for paragraph 15.
=head3 getAttributes(path, position)
=head3 getAttributes(element)
Returns a list of the element's attributes in the form of a hash
whose keys are the attributes' XML names.
=head3 getElement(path, position)
Returns an element's reference from an XPath path and a position (or
undef if the given path does not indicate an existing element).
Position indicators start at 0 just like in Perl tables (and other
programming languages).
Example:
my $p = $doc->getElement('//table:table', 0)
indicates an element containing the first table of a text document
or first sheet of a spreadsheet.
Positions can also be counted backwards from the end by giving
negative values, i.e. position -1 being the last element. Thus:
my $h = $doc->getElement('//text:h', -2);
indicates the second-last header of a text document.
Caution: the position indicators used here are not the same in
XPath. In XPath indicators start at 1 and negative values are not
allowed. So, the first element "text:p" would be shown as
"//text:p[1]" if using the getNodeByXPath method (see below),
whereas if using getElement it would be at position 0. An XPath
expression such as "//text:p[-1]" would return nothing.
When successful, this method ensures that the returned object is
indeed an element and not another type of node (e.g. attribute,
text, comment, etc.)
=head3 getElementList(path)
Returns a list of all elements at a specified path.
Example:
my @ref_summary = $doc->getElementList('//text:h');
The above example returns a table containing all header elements of
a text document.
The path can of course be a more complex XPath expression
stipulating, for example, a selection of attribute values. In most
cases, you should avoid complicating things unnecessarily
(especially in Text, Image and Styles modules), as there are methods
for searching by element type, attribute and content which are much
easier to use and avoid the need to supply XPath expressions.
Note: the returned list contains elements in the sense of getElement
and not a list of element contents.
=head3 getNodeByXPath(xpath_expression)
=head3 getNodeByXPath(xpath_expression, context)
=head3 getNodeByXPath(context, xpath_expression)
A low-level method which returns the node corresponding to the given
XPath expression, if it exists in the document. This method (which
gives unrestricted access to the entire content of a document) is
designed for use with the unexpected. You will obviously need to be
familiar with XPath syntax (not documented here) as well as
OpenOffice.org document structure. See also selectNodesByXPath.
=head3 getText(path, position)
=head3 getText(element)
Returns text in the local character set, possibly UTF-8 decoded,
contained in the element given as an argument (by path/position or
by reference).
Two equivalent examples:
# version 1
my $element = $doc->getElement('//text:p', 4);
my $text = $doc->getText($element);
# version 2
my $text = $doc->getText('//text:p', 4);
Version 2 is better if the only aim is to get the text from
paragraph 4. Version 1 is better, however, if during the course of
the program you want to perform other operations on the same
paragraph. Giving an element's reference will mean avoiding element
handling methods having to recalculate a reference from the XPath
path.
=head3 getTextList(path)
Returns text from all elements in the specified path.
Example:
my $summary = $doc->getTextList('//text:h');
my $report = $doc->getTextList('//text:span');
The $summary variable contains a concatenation of all headers.
$report contains all the words or character strings that "stand out"
which the user has designated by their context, e.g. words in
italics in a non-italic paragraph.
In a list context, the returned data is a table, each of whose
elements contains the text of an XML element. In a scalar context
(as in our two examples), the returned value is a unique piece of
editable text and each element's content is separated from that of
the following element by a line feed.
=head3 getXMLContent([filehandle])
Without argument, returns a document's entire XML content.
Exports the entire XML content to a flat file, if a file handle is
provided.
Note: the exported data are UTF8-encoded.
Example:
open my $fh, ">:utf8", "myfile.xml";
$doc->getXMLContent($fh);
close $fh;
Synonym: exportXMLContent()
=head3 getXPathValue(xpath_expression)
=head3 getXPathValue(context, xpath_expression)
=head3 getXPathValue(xpath_expression, context)
A low-level method which allows direct access to the value
corresponding to the given XPath expression in a document. Character
decoding is handled in the same way as with getText.
Example:
$expression = '//office:automatic-styles' .
'/style:style' .
'[@style:style-name="P1"]' .
'/@style:parent-style-name';
print $doc->getXPathValue($expression);
This sequence displays the name of the parent style of automatic
style "P1" (if it exists within the document). Remember that more
simple methods in Text and/or Styles modules would indeed produce
the same result.
The optional element reference "context" can be given as an argument
either in first or second place. In this case, the search is limited
to the section of the document tree below this given element. The
default search area is the entire document.
Just as with other methods which require XPath paths, this one is
primarily for internal use. It should not be used by the majority of
applications.
=head3 insertElement(path, position, name/xml [, options])
=head3 insertElement(element, name/xml [, options])
Inserts a new element before or after the element specified by
[path, position] or by reference.
If the "name" argument is a literal, a new element with the name
given is created and then inserted. If the same argument is a
reference to an existing element, this element is then simply
inserted at the position indicated. This method is useful either for
adding new elements or for copying elements from one document to
another or from one position to another within the same document.
Options are passed as [name => value] i.e.:
position => before
or
position => after
allowing you to choose if the insertion should be done before or
after the given element. The default position is before.
text => "text of element"
attribute => $attributes
The "attribute" option is itself a hash reference containing one or
more attributes in the form [name => value] as in appendElement.
When successful, this method returns the inserted element's
reference (else undef).
Example:
my $attributes =
{
'text:style-name' => 'Header 2',
'text:level' => '2'
};
$doc->insertElement
(
'//text:p', 4, 'text:h',
position => 'after',
text => 'New section',
attribute => $attributes
);
This sequence (in an OpenOffice.org Writer document) inserts a level
2 header 'New section' immediately after paragraph 4.
The $name argument can be replaced by an existing element. In this
case a new reference to the existing element is inserted, without
creating a whole new element. In this way you can display an element
at several locations or in several documents which is held in memory
only once. See the appendElement section for the consequences of
having multiple references to the same physical element. Better to
use replicateElement to insert separate copies of an element.
In the same conditions as in appendElement, the 'name' argument can
be replaced by an XML string which describes the element [14] .
Note: to add an element to the end of a document, it would obviously
be better to use appendElement.
=head3 isOpenDocument()
Returns 1 (true) if the current document is an OASIS Open Document.
To be used every time the application needs to know the format of
the document, knowing that some differences between the two formats
can't be completely hidden by the API.
=head3 makeXPath(expression)
=head3 makeXPath(context, expression)
Low-level method allowing the creation or direct modification
without restriction (almost) of any document element. It allows
"query" expressions in a language similar to XPath. If the given
XPath expression crosses several levels of hierarchy, intermediate
nodes can be created or modified "on the fly" by creating the
necessary path which in turn creates the final node.
Example:
$doc->makeXPath
(
'//office:body/text:p[4 @text:style-name="Text body"]'
);
This "query" applies the "Text body" style to paragraph 4 in the
body of the document. (In reality you will probably never use it
because the setStyle method of the Text module would do the same
thing much more simply.)
If, as in the above example, a node is accompanied by a position
indicator, it cannot be created but must simply act as a mandatory
"passage". This method cannot therefore be used to create, for
example, an Nth paragraph if there is already an N-1.
The only restrictions apply to namespaces which are given as
prefixes to element and attribute names. They must be defined in the
document i.e. conform to OpenOffice.org specifications. For the
rest, this method allows the creation of almost anything anywhere
within a document. Its use is reserved for OpenOffice XML
specialists.
In its second form, a context node can be given as the first
argument. If present, the path is sought (and if necessary created)
starting from its position. By default, the path begins from the
root.
The returned value is the final node's reference (found or created).
The full "query language" syntax used in this method is not
documented here. makeXPath is designed to act more as a base for
other OpenOffice::OODoc methods than to be used in applications.
=head3 raw_import(member, source)
Physically imports an external file into an OpenOffice.org archive
associated with an XPath object, if it exists i.e. if the object was
created using file or archive parameters. This method only transmits
the command to the OODoc::File's raw_import method. Caution: it must
not be used with an "active" element i.e. an XML member to which the
current XPath object or another XPath object is already associated.
Remember too that the import is not actually carried out by
OODoc::File until a save and the imported data is therefore not
immediately available.
=head3 raw_export(member, target)
Physically exports a member from an OpenOffice.org archive
associated with an XPath object, if it exists i.e. if the object was
created using file or archive parameters. This method only transmits
the command to the OODoc::File's raw_import method.
=head3 removeAttribute(path, position, attribute)
=head3 removeAttribute(element, attribute)
Deletes the "attribute" attribute (if found) of the given element by
[path, position] or by reference and returns "true". Has no physical
effect and returns undef if the attribute has not been defined or if
the element does not exist.
=head3 removeElement(path, position)
=head3 removeElement(element)
Deletes the given element (if found) by [path, position] or by
reference and returns "true". Returns undef if the element does not
exist.
=head3 reorganize
Technical method for maintaining the structure of the current
document, for use only in exceptional circumstances where the
application's operations risk destabilising the internal addressing
of elements. This will primarily happen when inserting new XML
elements in the form of XML strings after the document has been
loaded i.e. when the XML parser is again launched to include an
"addition" to an already parsed document.
Being costly in runtime, this method must not be called immediately
after each XML import or other address destabilising operation. A
single reorganize after each series of destabilising operations is
enough and even then perhaps only before you need to access an
element by [path, position]. Address destabilising operations are
not an issue if all elements are selected by reference, attribute or
content filter. Moreover it is absolutely unnecessary to call
reorganize just before calling a save.
=head3 replaceElement(path, position, replacement [, options])
=head3 replaceElement(old_element, new_element [, options])
Deletes the given element by [path, position] or by reference and
inserts another element in its place, either from another location
in the same document or from another document.
A new element can be supplied under the same conditions as for
insertElement.
By default or by using the mode => 'copy' option, it is a copy of
the new element which is inserted. With the mode => 'reference'
option, it is only a reference which is inserted. See the section on
appendElement for comments on the subject of multiple references to
a single physical element.
=head3 replaceText(path, position, filter, replacement)
=head3 replaceText(element, filter, replacement)
Replaces all sub-strings which match "filter" with "replacement" in
the text of an element indicated by [path, position] or by reference
and returns the modified text. The "filter" string can be an "exact"
literal or a regular expression.
Example:
$doc->replaceText($p, "C(LIENT|USTOMER)", $contact);
replaces each occurrence of "CLIENT" and "CUSTOMER" with the content
of the $contact variable in the paragraph $p of document $doc.
The "replacement" argument can be a function reference. In which
case, the function is called each time the string is matched, and
the value returned by the function is used as the replacement value.
sub action {
my $arg = shift;
my $text = shift;
print "$arg : $text\n";
return "OK";
}
$doc->replaceText($p, $expression, \&action, "Found");
displays "Found: <text>" (where <text> is the text retrieved) each
time a string matches $expression and replaces this string with
"OK". If $expression contains an "exact" string, then clearly the
text displayed will always be the same string. However, if it
happens to be a regular expression, it is in effect the text
retrieved which will be displayed.
Generally speaking, if the replacement value is a function
reference, the called function receives the remainder of the
arguments which follow it, in this order:
=head3 replicateElement(element, position [, options])
Makes a copy of the given element and inserts it into the current
document according to 'position' and, where indicated, according to
a hash of options.
If 'position' is another existing element then the new element is
inserted after the children of the existing element, except where
either pairs position=>'after' or position=>'before' are specified
in the list of options. In this case, the insertion is made at the
same hierarchical level as the positional element according to the
same logic as for insertElement [15] .
If the 'position' argument is given as 'end', then the new element
is added at the last child position of the root element.
If the 'position' argument is given as 'body', then the new element
is added at the end of the list of child elements of the element
which corresponds to the getBody value (requires an OODoc::XPath
type object, by default 'office:body').
If the 'position' argument is an existing element, then the new
element is inserted immediately before the given element by default.
If the pair position=>'after' are in the options list, the element
is inserted immediately after, as with insertElement.
Example:
my $template = $doc_source->selectElementByAttribute
(
'//style::style',
'style:name',
'Text body'
);
my $position = $doc_target->getElement
('//office:styles', 0);
$doc_target->replicateElement($template, $position);
This sequence adds a style 'Body of text' to the styles collection
of $doc_target which copies exactly the style of the same name in
$doc_source. Obviously, the section of code dealing with the search
for the element to copy and its position is the most laborious.
This method physically creates a new element which is an exact copy
of the given element, but which is physically separate from it.
This method is slower than simply modifying an existing element or
inserting an element reference.
Note: If the user needs only a "free" copy of the element (out of the
document structure, to be later attached), the XML::Twig::Elt copy()
method should be preferred.
=head3 replicateNode(count [, position])
It's method is an element method, not a document method. It allows
the caller object to be replicated one or more time. It's particularly
useful to insert lines in tables, but it can be used to replicate any
kind of element (paragraphs, sheets in Calc documents, styles, etc).
Example:
$doc->getTableRow('MyTable', 5)->replicateNode(3);
This line of code replicates 3 times the row 5 in the table "MyTable"
(the 3 new rows are inserted immediately after the calling row).
Without any argument, the calling element is replicated once. The
second argument must be 'before' or 'after' (default is 'after');
it controls the position of the copies, related to the original
element (but it generally doesn't matter).
=head3 save([filename])
Calls the 'save' method of an OODoc::File object to which the
current object is connected, passing the filename argument to it (if
provided). Only works if an OODoc::File object is indeed connected
(this generally means that the current OODoc::XPath object was
created with the constructor parameter 'file'). If not, an error is
produced.
If the document is not associated with a regular OpenOffice.org
compressed file (used through an OODoc::File object), it's saved
as "flat XML" to the given file. In such a situation, if the file name
is not provided, the source XML file (if any) is used as the target.
Note: if you need to save a document as flat XML while it's associated
with an OpenOffice.org file, you should use exportXMLContent() with an
application-provided file handle.
=head3 selectChildElementByName(path, position [, filter])
=head3 selectChildElementByName(element [, filter])
Returns the first (or only) element whose name matches "filter" from
within the child elements of the given element indicated by [path,
position] or by reference.
"filter" is taken to be a regular expression. If several values
match the filter, the first of these is returned (in the XML's
physical order which is not necessarily the logical order of the
document). See the comments about selectElementByAttribute if
wanting to select an exact name.
Returns undef if no elements match the condition.
Returns the first (or only) child (if there are more than one)
without anything else if no filter is given or if the filter uses
wildcards (".*").
=head3 selectChildElementsByName(path, position [, filter])
=head3 selectChildElementsByName(element [, filter])
Like selectChildElementByName, but returns a list of all elements
which match the condition.
Example:
my @search_words =
$doc->selectChildElementsByName
('//text:p', 4, 'text:span');
returns a list of elements from paragraph 4 which correspond to text
which has particular attributes which distinguish it from the rest
of the paragraph (colour, font, etc.)
=head3 selectElements([context,] path, filter)
=head3 selectElements([context,] path, filter, replacement)
=head3 selectElements([context,] path, filter, action [, arg1, ...])
Returns a list of elements corresponding to a given XPath path and
whose text matches the filter (regular expression). The "context"
argument, if given, is an element reference which limits the search
to its own child elements. The search is carried out in the entire
document by default.
An element is selected if the search string is found in its own text
or in the text of any element descended from it. E.g. An image
element (draw:image) can be selected from the value of its attached
"description" field.
You can replace all strings matching the search criteria with the
'replacement' string, on the fly, if the latter is given as an
argument after the filter.
Lastly, instead of a replacement string, you can pass a subroutine's
reference which will run (in call back mode) each time the search
string is matched. If this subroutine returns a defined value, this
value is used as the replacement string. The subroutine will
automatically receive the rest of the arguments, in this order:
If, as is generally the case, you are working exclusively with text
elements (paragraphs, headers, etc.), you would be better to use
selectElementsByContent of the Text module which is easier to use
and does not require an XPath expression.
Here is an example which returns the list of images whose
descriptors contain the word "landscape" and displays the name of
each selected image:
sub printMessage
{
my $doc = shift;
my $element = shift;
my $image = $element->parentNode;
print "Name: " . $image->find('@draw:name') . "\n";
}
my @list = $doc->selectElements
(
'//draw:image/svg:desc',
'landscape',
\&printMessage,
$doc
);
Never use this example of code in a real application as it is both
purely for demonstration and unnecessarily complex. You can perform
the same operation much more simply using the OODoc::Image module.
=head3 selectElementsByAttribute(path, attribute, filter)
In a list context, returns a list of elements at the given path with
the given attribute which contain a value matching the filter's
regular expression.
In a scalar context, returns the first (or only) element which
matches the same condition.
Returns undef if no elements match the condition.
Example:
my @paragraph_styles =
$doc->selectElementsByAttribute
('style:style', 'style:family', 'paragraph');
returns the list of elements which describe the paragraph styles of
document $doc.
Caution: the filter is treated as a regular expression and not as a
classic string. This means that the above piece of code might not
only return the elements whose "style:family" attribute equals
"paragraph", but also all those in which the same attribute contains
the word "paragraph". You must therefore use the appropriate syntax
(in regex language) if you want to select an exact value, which in
this case would be "^paragraph$".
=head3 selectElementByAttribute(path, attribute, value)
Like selectElementsByAttribute in a scalar context. Returns the
first (or only) element at the given path which has the given
attribute containing the given value.
Returns undef if no elements match the condition.
=head3 selectNodesByXPath(xpath_expression)
This low-level method returns a list of nodes (which are not
necessarily elements) which match the give XPath expression. See
getNodeByXPath for options and comments.
=head3 setAttributes(path, position, attributes_table)
=head3 setAttributes(element, attributes_table)
Modifies or adds one or more attributes to an element.
The element is indicated by reference or by [path, position].
The list of attributes is given in the form of a hash name => value.
Example:
my $h = $doc->getElement('//text:h', 12);
my %attributes =
(
'text:style-name' => 'My Header',
'text:level => '3'
);
$doc->setAttributes($h, %attributes);
This sequence gives the 'My Header' style and level 3 to the 13th
"header" element in the document.
=head3 setText(path, position, text)
=head3 setText(element, text)
Use the given text as the content of the given element.
Any previous content is replaced by the given one.
(See also extendText())
=head2 Properties
No class variables are exported; the applications, if needed,
must access them using their full name ($OpenOffice::OODoc::XPath:XXX)
The following names should be prefixed explicitly with
"$OpenOffice::OODoc::XPath::"
CHARS_TO_ESCAPE
contains the list of reserved characters which, in XML, should be
replaced by escape sequences.
OO_CHARSET
indicates the character set used for OpenOffice.org document
encoding and whose default value is 'utf8' (it should not be changed).
LOCAL_CHARSET
indicates the user's character set, by default 'iso-8859-1'; it must
be changed according to the real user's needs (warning: there is no
kind of automatic adaptation to the user's locales, so the application
must explicitly load the right value in this variable); it should be
done using the localEncoding() accessor (see the OpenOffice::OODoc(3)
man page and, for the list of supported character sets, the Encode
module's documentation).
The content of these three variables should not normally be directly
modified by the applications.
Instance hash variables are :
'archive' => <oodoc_file_object>
'file' => <OpenOffice.org file>
'member' => <file member>
'readable_XML' => <'on' or not>
'local_encoding' => <user's output encoding>
'xml' => <XML string>
'element' => <name of loaded XML element>
'xpath' => <XML::Twig object>
'opendocument' => <true if OASIS Open Document>
However, the 'xml' variable is cleared almost immediately after a
successful constructor call, in order to save memory. As soon as the
corresponding XPath object has been created, the XML source is no
longer required.
The 'xpath' variable of an OODoc::XPath object contains a reference
to the document structure as it's made available through XML::Twig
(see CPAN documentation). This object encompasses the entire current
XML tree. Each access to XML using OODoc::XPath objects is done via
XML::Twig. So, after having run the following command:
my $xp = $doc->{'xpath'};
the experienced programmer will be able to use $xp to access all the
functionality of the XML::Twig API, bearing in mind that all
operations using this interface will have a direct effect on the
content of the $doc object.
The 'opendocument' property, if true, means that the document is
declared as an OASIS Open Document. If this property is false or
undef, the document format is OpenOffice.org version 1. This property
should not be changed (as long as OpenOffice::OODoc can't change the
format of an existing document).
=head1 NOTES
See OpenOffice::OODoc::Notes(3) for the footnote citations ([n])
included in this page.
=head1 AUTHOR/COPYRIGHT
Copyright 2004-2005 by Genicorp, S.A. (http://www.genicorp.com)
Initial developer: Jean-Marie Gouarne (http://jean.marie.gouarne.online.fr)
Initial English version of the reference manual
by Graeme A. Hunter (graeme.hunter@zen.co.uk)
License:
- Genicorp General Public Licence v1.0
- GNU Lesser General Public License v2.1
Contact: oodoc@genicorp.com
=cut