=head1	NAME

OpenOffice::OODoc::XPath - Low-level XML navigation in the documents


This module is a low-level class which uses OODoc::File (without
inheriting anything from it) along with the classes defined in the
XML::XPath module. It's a common basis for the other, more user-
friendly, document-oriented modules.

This chapter can be skipped by programmers who are only interested
in document types handled by the specialist classes which follow.
Understanding these classes is easier and using them requires less
Perl and XML expertise. However, calling OODoc::XPath remains a good
rescue option as it allows all kinds of operations on all types of
XML members contained in any OpenOffice.org document.

This class forms the basis of OODoc::Meta, OODoc::Text,
OODoc::Styles and OODoc::Image. It contains the lowest layer of
navigation services for XML documents and handles the link with
OODoc::File for file access. Its primary role is as an interface
with the XML::XPath API.

OODoc::XPath is based on the XML::XPath module (see CPAN
documentation). In the following chapters, you will see elements
often mentioned. When it says that a module expects a parameter or
returns an element (either singly or as a list), it is referring to
an XML element. More precisely, it is referring to an object of the
XML::XPath::Node::Element class (unless otherwise stated) and all
available methods this object confers. Generally speaking, it is not
necessary to call these low-level methods contained within
OODoc::XPath and its descendants using their simpler form. It is
however important to distinguish elements from their content
(elements being simply references to XML data structures). To read
or modify the content of an element such as its text or XML
attributes, use the accessors also available within OODoc::XPath.

In most cases where XPath methods require a reference to an element
as an argument, there are two ways of proceeding:

	- reference the element directly (obtained previously)
	- or give the XML::XPath path and position, being a string and
	an integer respectively [2]

Some methods accept both forms which means that if the first
parameter is recognised as an element reference, the position does
not need to be given. Therefore the number of arguments for certain
OODoc::XPath methods can vary.

For those who really want to access all areas there are also
OODoc::XPath methods which allow unrestricted access to every
element or XML attribute via an access path in XPath syntax. If you
are into this kind of thing, we recommend you obtain good syntax
reference manuals for XPath and OpenOffice.org and a supply of

Methods which may return several lines of text (e.g. getTextList) do
so either in the form of an unique character string containing "\n"
separators or in table form.

Unless otherwise stated, the word document in this chapter only
refers to XML documents contained within OODoc::XPath objects and
not OpenOffice.org documents (as an end user would use).

Amongst the different methods which return elements, attributes or
text, some are called getXxx, others selectXxx or findXxx. Read
methods whose names start with "get" generally refer to an
unfiltered object or list, whereas others return an object or list
filtered according to a parameter value. In this latter case the
search parameter is treated as a standard expression and not an
exact value. This means that if the search criteria is "xyz", all
text containing "xyz" will be considered a match. To restrict the
search to text exactly equal to "xyz", use "^xyz$" as the search
criteria (following Perl regular expression syntax).

Several methods allow you to place copies of or references to
elements (from other documents or from other positions in the same
document) in any position in the current document. This offers
powerful manoeuvrability but only if these placements conform with
the destination position's context [3] .

For advanced users familiar with the XML::XPath API, it might be
interesting to know that all the objects called "elements" in the
following chapters are objects of the XML::XPath::Node::Element
class, and that all methods associated with this class are directly
applicable to them, on top of the functionality described in this
manual. However, this should not normally be needed.

Important note: We recommend using OODoc::Meta and OODoc::Document
(which are both OODoc::XPath derivatives) to manipulate metadata
(for all document types) and content (for text documents)
respectively. These two objects provide highest-level methods which
are neater and more productive. Explicit use of XPath methods (which
sometimes require large numbers of parameters) should only be
considered as a last resort in unexpected circumstances for access
to any element or XML attribute not handled by more "friendly"

=head2	Methods

=head3	Constructor : OpenOffice::OODoc::XPath->new(<parameters>); [4] 

        Short Form: ooXPath(<parameters>)

        Returns a new instance of XPath, containing a well-formed XML
        document given directly or indirectly as a parameter.

        Parameters are named (hash key => value). The constructor must get
        at least one parameter giving a means of obtaining the XML document
        that it will represent. Three options are available:

            my $doc = OpenOffice::OODoc::XPath->new(xml => $xml_string);

            my $doc = OpenOffice::OODoc::XPath->new
            	(archive => $oofile, member => 'meta');

            my $doc = OpenOffice::OODoc::XPath->new
            	(file => 'source.sxw', member => 'content');

	(Remember you can replace "OpenOffice::OODDoc::XPath" by "ooXPath"
	in the instructions above, provided that you have loaded the
	main OpenOffice::OODoc module, that defines this shortcut, and not
	only and explicitly OpenOffice::OODoc::XPath.)

        The first method returns an XML string directly (obtained or created
        previously by the program).

        The second method links OODoc::XPath to an existing OODoc::File
        object and indicates which XML member it is to extract (metadata,
        content, styles, etc).

        The third method returns an OpenOffice.org file and an member to
        extract, but this time there is no pre-existing OODoc::File. In this
        case, the XPath constructor will instance a "private" File object
        and connect to it for its own use.

        This third method is generally considered to be the easiest since
        the use of OODoc::File becomes invisible. It does not, however,
        allow sharing of the same OODoc::File between several OODoc::XPath.
        It must therefore only be used if the program accesses one single
        component of the OpenOffice.org file.

        A program which must access a spreadsheet's content and page layout
        simultaneously could do it like this:

            my $archive	= ooFile("invoice.sxc");
            my $content	= ooXPath(archive => $archive, member => 'content');
            my $styles	= ooXPath(archive => $archive, member => 'styles');

        Caution: being associated with an archive via OODoc::File, none of
        these OODoc::XPath objects should be deleted before the final save
        call for this archive. So by calling a save, the File object "calls
        up" all the XPath objects which were "connected" to it in order to
        "ask" each of them for the changes which were made to the XML
        (content, styles, meta, etc.). The results are unpredictable if any
        of them is absent when called. In short, an application should never
        delete (undef) OODoc::XPath objects; their number should be kept to
        an absolute minimum and their lifespan should be the same as that of
        the program itself.

        You can pass the optional parameter 'element' in any case where the
        constructor is called without the 'xml' parameter. Bearing in mind
        that an OODoc::XPath object will not necessarily handle an entire
        XML document, this extra parameter indicates the name of the XML
        element to be loaded and handled. If the 'element' parameter is not
        given for an OpenOffice.org document, a default element will be
        chosen according to the following table:

            'meta'	=> 'office:document-meta'
            'content'	=> 'office:document-content'
            'styles'	=> 'office:document-styles'
            'settings'	=> 'office:document-settings'

        Conversely, the 'element' parameter becomes mandatory if the chosen
        XML element is not listed above. Through OODoc::File, OODoc::XPath
        can actually access archives which are not necessarily in
        OpenOffice.org format and may be, for example, "banks" of
        presentation and content templates.

        The parser parameter can be added if the program already has an
        XML::XPath::XMLParser object [5] . (The same XML parser can be
        shared between several OODoc::XPath objects.) If this parameter is
        absent, it automatically becomes the parser if the application has a
        global variable called $XML_PARSER which effectively points to an
        XML::XPath::XMLParser. If not, a "private" XML parser is
        automatically created but it is still possible to share it later
        using getXMLParser. Therefore if OODoc::XPath is loaded indirectly
        via the main OpenOffice::OODoc module (which is normal), a global
        $XML_PARSER is automatically created without the application having
        to do anything.

	If the application needs to create a new document, and not process
	an existing one, an additional option must be passed:

		create		=> "class"

	where "class" must be one of the following list: "text",
	"spreadsheet", "presentation" or "drawing", according to the needed
	content class. And, for very special needs, the user can pass an
	additional "template_path" to select an ad hoc directory of XML
	templates instead of the default one. This user-provided directory
	must have the same kind of structure and content as the "templates"
	subdirectory of the OpenOffice::OODoc installation.

	A optional 'readable_XML' can be passed. If this option is provided
	and set to 'on', each XML element created by the application is
	followed by a line break. Be careful, this option significantly
	increases the processing time, so it should be set for debugging only

        Other optional parameters can also be passed to the constructor (see
        Properties below).

=head3	appendElement(path, position, name/xml, [options]);

=head3	appendElement(element, name/xml, [options]);

        Adds a new element or existing element to the list of child elements
        of an existing parent element given first (by [path, position] or by

        The argument after the position argument can be an XML element name.


            	'//office:body', 0, 'text:p',
            	text => "New text"

        adds a paragraph containing the phrase "New text" to the end of the
        body of the document [6] .

        If the 'text' option is omitted, an empty element is created (in the
        above example it would be an empty paragraph or line feed).

        You can pass the 'attribute' option which is really a hash whose
        keys are the XML attribute names and whose values are the XML
        attribute values. Use of these options depends on the type of
        document and the type of element and requires knowledge of
        OpenOffice.org conventions.


            $my_style	=
            	'style:name'	=> 'P1',
            	'style:family	=> 'paragraph'

            	'//office:automatic-styles', 0, 'style:style',
            	attribute	=> $my_style

        creates a new paragraph style called 'P1' in the list of "automatic
        styles" [7] .

        This method lets you add any kind of element into a document, even
        exotic ones. With the most common OpenOffice.org objects (e.g.
        paragraphs), though, it is easier to use the specialist methods
        contained in other modules.

        The 'name' argument can be replaced by an existing element in the
        same OODoc::XPath object or in another. In which case no element is
        created but the existing element is simply referenced with a new
        position even though it remains in its old position. Caution: any
        modification of an element which is referenced several times in one
        or more documents is made to all references. If you want to add a
        similar but separate element, you must use replicateElement which
        produces a new element from the content of an existing one.

        The 'name' argument can also be replaced by an XML string. This
        string must correspond to the correct XML description of a UTF-8
        encoded [8]  OpenOffice.org element. For example, it could be a
        string which had been previously exported using the exportXMLElement
        method of OODoc::XPath, or extracted from an OpenOffice.org file by
        some other application [9] .

        The following piece of code produces the same result as the first

            $xml =	'<text:p text:style-name="Standard">'	.
            	'New text'				.
            	'//office:body', 0, $xml

        Using this method, after one or more element creations by direct
        importation of XML strings, it might be useful to call the
        reorganize method (but not absolutely necessary).

=head3	cloneContent(oodoc_xpath_object)

        Cancels the entire document contents of the current instance and
        replaces it with a reference to the contents of another OODoc::XPath


            $doc1	= OpenOffice::OODoc::XPath->new
            		file	=> 'template.sxc',
            		member	=> 'styles'
            $doc2	= OpenOffice::OODoc::XPath->new
            		file	=> 'sheet.sxc',
            		member	=> 'styles'

        This sequence replaces the styles and page layout of 'sheet.sxc'
        with those of 'template.sxc'.

        The above example could easily have been written without even using
        OODoc::XPath by acting directly on the files. For example, extract
        the 'styles.xml' member from 'template.sxc' and insert it into
        'sheet.sxc'. The use of OODoc::XPath and the cloneContent method
        guarantees that the transferred content corresponds to an
        OpenOffice.org document and allows reads/writes to it on the fly.

        Caution: the "cloned" content is not physically copied. Calling this
        method references one single physical content in two documents. Any
        modifications made to the content of either of these two documents
        applies equally to the other and vice-versa.

=head3	contentClass([class name])

	Accessor to get or set the class of the document content. If the
	current member is a document content, returns its class according
	to the OpenOffice.org terminology, i.e. one of the following values:
	"text", "spreadsheet", "presentation", or "drawing".

	Returns an empty string if the current member is not a document
	content (if it's, for example, the "meta" or "styles" member).

	With a given class name as an optional argument, this method can
	change or set the document class; but this risky operation is, of
	course, for advanced users knowing exactly that they do. Changing
	the content class allows, for example, to turn a text containing
	some tables in a spreadsheet. But some things must be changes
	elsewhere in order to produce a consistent OpenOffice.org document.
	For example, the MIME type of the document, stored in its manifest,
	must be changed accordingly (see OpenOffice::OODoc::Manifest). And,
	of course, the real content must be consistent with the declared
	content class.

=head3	createElement(name, text)

=head3	createElement(xml)

        Creates a new element without attributes which is not inserted in a


            my $element =
            		('my_element', 'its content');

        creates a new XML element without attributes and returns its

        Instead of a name, the first argument can be the full XML
        description of the element. Example:

            my $element = $doc->createElement
            		('<text:p>My text</text:p>');

        This new element is temporary: it is not linked to any document. It
        is destined to be used later by another method.

        The name can contain a namespace prefix which would look like this:

        In its second form, a well-formed XML string can be supplied as a
        single argument. The recognition criteria is the presence of the "<"
        character at the beginning of the argument. See appendElement for
        comments on the direct insertion of XML.

        Explicit calls to createElement should be rare. This method is
        normally called silently by higher-level methods which are capable
        of creating an element, inserting it in a document's XML tree and
        giving it attributes (see appendElement and insertElement).

=head3	decode_text(utf8_string)

        Caution: this method is a non-exported class method. It must be used
        like this:


        and not from an OODoc::XPath instance.

        Decodes a UTF8 [10]  string and returns an 8 bit character [11]
        translation of it out of the user's character set, as defined by the
        following variable:


        for which the default value is 'ISO-8859-1' [12] .

        Explicit calls to this method should be rare. It is used internally
        by methods which return text extracted from document content (e.g.

        Warning to contributors: any method which returns text extracted
        from OpenOffice.org documents is based on decode_text; so any
        modification or improvement of the decoding logic should be made

=head3	encode_text(editable_string)

        Class method.

        Encodes "local" character strings (for writing to OpenOffice.org


            $string = OpenOffice::OODoc::encode_text($local_string);

        The local character string is defined by the following global


        for which the default value is 'ISO-8859-1'.

        Explicit calls to this method should generally be avoided. It is
        used internally by methods which insert text or attribute values
        into documents (e.g. setText).

=head3	exportXMLBody()

        Returns the XML string for use by another application representing
        the body of a document, without UTF8 decoding.

=head3	exportXMLElement(path, position)

=head3	exportXMLElement(element)

        Returns the XML string which represents a particular document
        element (style definition, paragraph, table cell, object, etc.) for
        use by another application without UTF8 decoding.

        This method is principally designed to allow remote exchanges of
        elements between programs using any XML storage or transfer method.
        It acts as "sender" whilst the "receiver" can use appendElement or
        insertElement (for example) to insert any exported elements into a
        document. Example:

            # sender programme
            # ...
            open (EXPORT, "> transfer.xml");
            print EXPORT $doc->exportXMLElement('//text:p', 15);
            close EXPORT;

            # receiver programme
            # ...
            open (IMPORT, "< transfer.xml");
            $doc->appendElement('//office:body', 0, <IMPORT>);
            close (IMPORT);

        In this example, a paragraph is transferred but it could just as
        easily be any content, presentation or metadata element.

        Conversely, this method is not needed when transferring an element
        from one document to another in the same program (or from one
        document position to another). An element can be copied directly
        from within the same program by reference or replication without
        going via its XML (see appendElement, insertElement and

=head3	extendText(path, position, text)

=head3	extendText(element, text)

	Appends the given text to the previous content of the given


		$doc->setText($p, "Initial content");
		$doc->extendText($p, " extended");
	Assuming $p is a regular text element (ex: a paragraph), its
	content becomes "Initial content extended".

	(See also setText()).

=head3	findElementList(element, filter [, replacement])

        Returns a list of child elements (from a document's tree) of the
        element given as an argument whose content agrees with the 'filter'
        parameter. The filter can be an exact string match or a regular
        expression. If the filter is omitted or contains a wildcard
        expression like '.*', the returned list will contain all child
        elements without condition.

        If the third argument ('replacement') is given, every string which
        matches the filter in each child element will be replaced by this
        'replacement' value. This 'replacement' argument can be a character
        string or a function reference. (See replaceText method below.)

        Filtering and possible replacement only affects an element's content
        and not its attributes [13] .

        This method is mostly for internal use. We recommend using other
        methods for the selective extraction of elements.

=head3	getAttribute(path, position, name)

=head3	getAttribute(element, name)

        Returns the 'name' value of the chosen element (or undef if name is
        not defined or if the element does not exist).


            my $style	=
             $doc->getAttribute('//text:p', 15, 'text:style-name');

        returns the style for paragraph 15.

=head3	getAttributes(path, position)

=head3	getAttributes(element)

        Returns a list of the element's attributes in the form of a hash
        whose keys are the attributes' XML names.

=head3	getElement(path, position)

        Returns an element's reference from an XPath path and a position (or
        undef if the given path does not indicate an existing element).

        Position indicators start at 0 just like in Perl tables (and other
        programming languages).


            my $p = $doc->getElement('//table:table', 0)

        indicates an element containing the first table of a text document
        or first sheet of a spreadsheet.

        Positions can also be counted backwards from the end by giving
        negative values, i.e. position -1 being the last element. Thus:

            my $h = $doc->getElement('//text:h', -2);

        indicates the second-last header of a text document.

        Caution: the position indicators used here are not the same in
        XPath. In XPath indicators start at 1 and negative values are not
        allowed. So, the first element "text:p" would be shown as
        "//text:p[1]" if using the getNodeByXPath method (see below),
        whereas if using getElement it would be at position 0. An XPath
        expression such as "//text:p[-1]" would return nothing.

        When successful, this method ensures that the returned object is
        indeed an element and not another type of node (e.g. attribute,
        text, comment, etc.)

=head3	getElementList(path)

        Returns a list of all elements at a specified path.


            my @ref_summary = $doc->getElementList('//text:h');

        The above example returns a table containing all header elements of
        a text document.

        The path can of course be a more complex XPath expression
        stipulating, for example, a selection of attribute values. In most
        cases, you should avoid complicating things unnecessarily
        (especially in Text, Image and Styles modules), as there are methods
        for searching by element type, attribute and content which are much
        easier to use and avoid the need to supply XPath expressions.

        Note: the returned list contains elements in the sense of getElement
        and not a list of element contents.

=head3	getNodeByXPath(xpath_expression)

=head3	getNodeByXPath(xpath_expression, context)

=head3	getNodeByXPath(context, xpath_expression)

        A low-level method which returns the node corresponding to the given
        XPath expression, if it exists in the document. This method (which
        gives unrestricted access to the entire content of a document) is
        designed for use with the unexpected. You will obviously need to be
        familiar with XPath syntax (not documented here) as well as
        OpenOffice.org document structure. See also selectNodesByXPath.

=head3	getText(path, position)

=head3	getText(element)

        Returns text in the local character set, possibly UTF-8 decoded,
        contained in the element given as an argument (by path/position or
        by reference).

        Two equivalent examples:

        # version 1

        my $element	= $doc->getElement('//text:p', 4);

        my $text	= $doc->getText($element);

        # version 2

        my $text	= $doc->getText('//text:p', 4);

        Version 2 is better if the only aim is to get the text from
        paragraph 4. Version 1 is better, however, if during the course of
        the program you want to perform other operations on the same
        paragraph. Giving an element's reference will mean avoiding element
        handling methods having to recalculate a reference from the XPath

=head3	getTextList(path)

        Returns text from all elements in the specified path.


            my $summary = $doc->getTextList('//text:h');

            my $report = $doc->getTextList('//text:span');

        The $summary variable contains a concatenation of all headers.
        $report contains all the words or character strings that "stand out"
        which the user has designated by their context, e.g. words in
        italics in a non-italic paragraph.

        In a list context, the returned data is a table, each of whose
        elements contains the text of an XML element. In a scalar context
        (as in our two examples), the returned value is a unique piece of
        editable text and each element's content is separated from that of
        the following element by a line feed.

=head3	getXMLContent

        Returns a document's entire XML content.

=head3	getXMLParser

        Technical method which returns the reference of the XML parser being
        used by the current OODoc::XPath object, for later re-use elsewhere.
        The returned object is of type XML::XPath::XMLParser.

=head3	getXPathValue(xpath_expression)

=head3	getXPathValue(context, xpath_expression)

=head3	getXPathValue(xpath_expression, context)

        A low-level method which allows direct access to the value
        corresponding to the given XPath expression in a document. Character
        decoding is handled in the same way as with getText.


            $expression =	'//office:automatic-styles'	.
            		'/style:style'			.
            		'[@style:style-name="P1"]'	.

            print $doc->getXPathValue($expression);

        This sequence displays the name of the parent style of automatic
        style "P1" (if it exists within the document). Remember that more
        simple methods in Text and/or Styles modules would indeed produce
        the same result.

        The optional element reference "context" can be given as an argument
        either in first or second place. In this case, the search is limited
        to the section of the document tree below this given element. The
        default search area is the entire document.

        Just as with other methods which require XPath paths, this one is
        primarily for internal use. It should not be used by the majority of

=head3	insertElement(path, position, name/xml [, options])

=head3	insertElement(element, name/xml [, options])

        Inserts a new element before or after the element specified by
        [path, position] or by reference.

        If the "name" argument is a literal, a new element with the name
        given is created and then inserted. If the same argument is a
        reference to an existing element, this element is then simply
        inserted at the position indicated. This method is useful either for
        adding new elements or for copying elements from one document to
        another or from one position to another within the same document.

        Options are passed as [name => value] i.e.:

            position	=> before


            position	=> after

        allowing you to choose if the insertion should be done before or
        after the given element. The default position is before.

            text	=> "text of element"

            attribute	=> $attributes

        The "attribute" option is itself a hash reference containing one or
	more attributes in the form [name => value] as in appendElement.

        When successful, this method returns the inserted element's
        reference (else undef).


            my $attributes	=
            	'text:style-name'	=> 'Header 2',
            	'text:level'		=> '2'
            	'//text:p', 4, 'text:h',
            	position		=> 'after',
            	text		=> 'New section',
            	attribute	=> $attributes

        This sequence (in an OpenOffice.org Writer document) inserts a level
        2 header 'New section' immediately after paragraph 4.

        The $name argument can be replaced by an existing element. In this
        case a new reference to the existing element is inserted, without
        creating a whole new element. In this way you can display an element
        at several locations or in several documents which is held in memory
        only once. See the appendElement section for the consequences of
        having multiple references to the same physical element. Better to
        use replicateElement to insert separate copies of an element.

        In the same conditions as in appendElement, the 'name' argument can
        be replaced by an XML string which describes the element [14] .

        Note: to add an element to the end of a document, it would obviously
        be better to use appendElement.

=head3	makeXPath(expression)

=head3	makeXPath(context, expression)

        Low-level method allowing the creation or direct modification
        without restriction (almost) of any document element. It allows
        "query" expressions in a language similar to XPath. If the given
        XPath expression crosses several levels of hierarchy, intermediate
        nodes can be created or modified "on the fly" by creating the
        necessary path which in turn creates the final node.


             '//office:body/text:p[4 @text:style-name="Text body"]'

        This "query" applies the "Text body" style to paragraph 4 in the
        body of the document. (In reality you will probably never use it
        because the setStyle method of the Text module would do the same
        thing much more simply.)

        If, as in the above example, a node is accompanied by a position
        indicator, it cannot be created but must simply act as a mandatory
        "passage". This method cannot therefore be used to create, for
        example, an Nth paragraph if there is already an N-1.

        The only restrictions apply to namespaces which are given as
        prefixes to element and attribute names. They must be defined in the
        document i.e. conform to OpenOffice.org specifications. For the
        rest, this method allows the creation of almost anything anywhere
        within a document. Its use is reserved for OpenOffice XML

        In its second form, a context node can be given as the first
        argument. If present, the path is sought (and if necessary created)
        starting from its position. By default, the path begins from the

        The returned value is the final node's reference (found or created).

        The full "query language" syntax used in this method is not
        documented here. makeXPath is designed to act more as a base for
        other OpenOffice::OODoc methods than to be used in applications.

=head3	raw_import(member, source)

        Physically imports an external file into an OpenOffice.org archive
        associated with an XPath object, if it exists i.e. if the object was
        created using file or archive parameters. This method only transmits
        the command to the OODoc::File's raw_import method. Caution: it must
        not be used with an "active" element i.e. an XML member to which the
        current XPath object or another XPath object is already associated.
        Remember too that the import is not actually carried out by
        OODoc::File until a save and the imported data is therefore not
        immediately available.

=head3	raw_export(member, target)

        Physically exports a member from an OpenOffice.org archive
        associated with an XPath object, if it exists i.e. if the object was
        created using file or archive parameters. This method only transmits
        the command to the OODoc::File's raw_import method.

=head3	removeAttribute(path, position, attribute)

=head3	removeAttribute(element, attribute)

        Deletes the "attribute" attribute (if found) of the given element by
        [path, position] or by reference and returns "true". Has no physical
        effect and returns undef if the attribute has not been defined or if
        the element does not exist.

=head3	removeElement(path, position)

=head3	removeElement(element)

        Deletes the given element (if found) by [path, position] or by
        reference and returns "true". Returns undef if the element does not

=head3	reorganize

        Technical method for maintaining the structure of the current
        document, for use only in exceptional circumstances where the
        application's operations risk destabilising the internal addressing
        of elements. This will primarily happen when inserting new XML
        elements in the form of XML strings after the document has been
        loaded i.e. when the XML parser is again launched to include an
        "addition" to an already parsed document.

        Being costly in runtime, this method must not be called immediately
        after each XML import or other address destabilising operation. A
        single reorganize after each series of destabilising operations is
        enough and even then perhaps only before you need to access an
        element by [path, position]. Address destabilising operations are
        not an issue if all elements are selected by reference, attribute or
        content filter. Moreover it is absolutely unnecessary to call
        reorganize just before calling a save.

=head3	replaceElement(path, position, replacement [, options])

=head3	replaceElement(old_element, new_element [, options])

        Deletes the given element by [path, position] or by reference and
        inserts another element in its place, either from another location
        in the same document or from another document.

        A new element can be supplied under the same conditions as for

        By default or by using the mode => 'copy' option, it is a copy of
        the new element which is inserted. With the mode => 'reference'
        option, it is only a reference which is inserted. See the section on
        appendElement for comments on the subject of multiple references to
        a single physical element.

=head3	replaceText(path, position, filter, replacement)

=head3	replaceText(element, filter, replacement)

        Replaces all sub-strings which match "filter" with "replacement" in
        the text of an element indicated by [path, position] or by reference
        and returns the modified text. The "filter" string can be an "exact"
        literal or a regular expression.


            $doc->replaceText($p, "C(LIENT|USTOMER)", $contact);

        replaces each occurrence of "CLIENT" and "CUSTOMER" with the content
        of the $contact variable in the paragraph $p of document $doc.

        The "replacement" argument can be a function reference. In which
        case, the function is called each time the string is matched, and
        the value returned by the function is used as the replacement value.

        sub action	{

        		my $arg = shift;

        		my $text = shift;

        		print "$arg : $text\n";

        		return "OK";


            $doc->replaceText($p, $expression, \&action, "Found");

        displays "Found: <text>" (where <text> is the text retrieved) each
        time a string matches $expression and replaces this string with
        "OK". If $expression contains an "exact" string, then clearly the
        text displayed will always be the same string. However, if it
        happens to be a regular expression, it is in effect the text
        retrieved which will be displayed.

        Generally speaking, if the replacement value is a function
        reference, the called function receives the remainder of the
        arguments which follow it, in this order:

=head3	replicateElement(element, position [, options])

        Makes a copy of the given element and inserts it into the current
        document according to 'position' and, where indicated, according to
        a hash of options.

        If 'position' is another existing element then the new element is
        inserted after the children of the existing element, except where
        either pairs position=>'after' or position=>'before' are specified
        in the list of options. In this case, the insertion is made at the
        same hierarchical level as the positional element according to the
        same logic as for insertElement [15] .

        If the 'position' argument is given as 'end', then the new element
        is added at the last child position of the root element.

        If the 'position' argument is given as 'body', then the new element
        is added at the end of the list of child elements of the element
        which corresponds to the getBody value (requires an OODoc::XPath
        type object, by default 'office:body').

        If the 'position' argument is an existing element, then the new
        element is inserted immediately before the given element by default.
        If the pair position=>'after' are in the options list, the element
        is inserted immediately after, as with insertElement.


            my $template = $doc_source->selectElementByAttribute
            		'Body of text'
            my $position = $doc_target->getElement
            		('//office:styles', 0);
            $doc_target->replicateElement($template, $position);

        This sequence adds a style 'Body of text' to the styles collection
        of $doc_target which copies exactly the style of the same name in
        $doc_source. Obviously, the section of code dealing with the search
        for the element to copy and its position is the most laborious.

        This method physically creates a new element which is an exact copy
        of the given element, but which is physically separate from it.

        This method is much slower than simply modifying an existing element
        or inserting an element reference and heavy use is not recommended.

=head3	save

=head3	save(filename)

        Calls the 'save' method of an OODoc::File object to which the
        current object is connected, passing the filename argument to it (if
        provided). Only works if an OODoc::File object is indeed connected
        (this generally means that the current OODoc::XPath object was
        created with the constructor parameter 'file'). If not, an error is

=head3	selectChildElementByName(path, position [, filter])

=head3	selectChildElementByName(element [, filter])

        Returns the first (or only) element whose name matches "filter" from
        within the child elements of the given element indicated by [path,
        position] or by reference.

        "filter" is taken to be a regular expression. If several values
        match the filter, the first of these is returned (in the XML's
        physical order which is not necessarily the logical order of the
        document). See the comments about selectElementByAttribute if
        wanting to select an exact name.

        Returns undef if no elements match the condition.

        Returns the first (or only) child (if there are more than one)
        without anything else if no filter is given or if the filter uses
        wildcards (".*").

=head3	selectChildElementsByName(path, position [, filter])

=head3	selectChildElementsByName(element [, filter])

        Like selectChildElementByName, but returns a list of all elements
        which match the condition.


            my @search_words =
            		('//text:p', 4, 'text:span');

        returns a list of elements from paragraph 4 which correspond to text
        which has particular attributes which distinguish it from the rest
        of the paragraph (colour, font, etc.)

=head3	selectElements([context,] path, filter)

=head3	selectElements([context,] path, filter, replacement)

=head3	selectElements([context,] path, filter, action [, arg1, ...])

        Returns a list of elements corresponding to a given XPath path and
        whose text matches the filter (regular expression). The "context"
        argument, if given, is an element reference which limits the search
        to its own child elements. The search is carried out in the entire
        document by default.

        An element is selected if the search string is found in its own text
        or in the text of any element descended from it. E.g. An image
        element (draw:image) can be selected from the value of its attached
        "description" field.

        You can replace all strings matching the search criteria with the
        'replacement' string, on the fly, if the latter is given as an
        argument after the filter.

        Lastly, instead of a replacement string, you can pass a subroutine's
        reference which will run (in call back mode) each time the search
        string is matched. If this subroutine returns a defined value, this
        value is used as the replacement string. The subroutine will
        automatically receive the rest of the arguments, in this order:

        If, as is generally the case, you are working exclusively with text
        elements (paragraphs, headers, etc.), you would be better to use
        selectElementsByContent of the Text module which is easier to use
        and does not require an XPath expression.

        Here is an example which returns the list of images whose
        descriptors contain the word "landscape" and displays the name of
        each selected image:

        sub	printMessage


        	my $doc		= shift;

        	my $element	= shift;

        	my $image = $element->parentNode;

        	print "Name: " . $image->find('@draw:name') . "\n";


            my @list = $doc->selectElements

        Never use this example of code in a real application as it is both
        purely for demonstration and unnecessarily complex. You can perform
        the same operation much more simply using the OODoc::Image module.

=head3	selectElementsByAttribute(path, attribute, filter)

        In a list context, returns a list of elements at the given path with
        the given attribute which contain a value matching the filter's
        regular expression.

        In a scalar context, returns the first (or only) element which
        matches the same condition.

        Returns undef if no elements match the condition.


            my @paragraph_styles =
            	('style:style', 'style:family', 'paragraph');

        returns the list of elements which describe the paragraph styles of
        document $doc.

        Caution: the filter is treated as a regular expression and not as a
        classic string. This means that the above piece of code might not
        only return the elements whose "style:family" attribute equals
        "paragraph", but also all those in which the same attribute contains
        the word "paragraph". You must therefore use the appropriate syntax
        (in regex language) if you want to select an exact value, which in
        this case would be "^paragraph$".

=head3	selectElementByAttribute(path, attribute, value)

        Like selectElementsByAttribute in a scalar context. Returns the
        first (or only) element at the given path which has the given
        attribute containing the given value.

        Returns undef if no elements match the condition.

=head3	selectNodesByXPath(xpath_expression)

        This low-level method returns a list of nodes (which are not
        necessarily elements) which match the give XPath expression. See
        getNodeByXPath for options and comments.

=head3	setAttributes(path, position, attributes_table)

=head3	setAttributes(element, attributes_table)

        Modifies or adds one or more attributes to an element.

        The element is indicated by reference or by [path, position].

        The list of attributes is given in the form of a hash name => value.


            my $h = $doc->getElement('//text:h', 12);
            my %attributes =
            	'text:style-name'		=> 'My Header',
            	'text:level		=> '3'
            $doc->setAttributes($h, %attributes);

        This sequence gives the 'My Header' style and level 3 to the 13th
        "header" element in the document.

=head3	setText(path, position, text)

=head3	setText(element, text)

	Use the given text as the content of the given element.

	Any previous content is replaced by the given one.

	(See also extendText())

=head2	Properties

        No class variables are exported; the applications, if needed,
	must access them using their full name ($OpenOffice::OODoc::XPath:XXX)

        The following names should be prefixed explicitly with


        contains the list of reserved characters which, in XML, should be
        replaced by escape sequences.


        indicates the character set used for OpenOffice.org document
        encoding and whose default value is 'utf8' (it should not be changed).


        indicates the user's character set, by default 'iso-8859-1'; it must
	be changed according to the real user's needs (warning: there is no
	kind of automatic adaptation to the user's locales, so the application
	must explicitly load the right value in this variable); it should be
	done using the localEncoding() accessor (see the OpenOffice::OODoc(3)
	man page and, for the list of supported character sets, the Encode
	module's documentation).

        The content of these three variables should not normally be directly
        modified by the applications.

        Instance hash variables are :

            'archive'		=> <oodoc_file_object>
            'file'		=> <OpenOffice.org file>
            'member'		=> <file member>
	    'readable_XML'	=> <'on' or not>
            'xml'		=> <XML string>
            'element'		=> <name of loaded XML element>
	    'xpath'		=> <XML::XPath object>
            'parser'		=> <XML parser>

        However, the 'xml' variable is cleared almost immediately after a
        successful constructor call, in order to save memory. As soon as the
        corresponding XPath object has been created, the XML source is no
        longer required.

        The 'xpath' variable of an OODoc::XPath object contains a reference
        to an XML::XPath class object (see CPAN documentation). Remember
        that OODoc::XPath is based on XML::XPath, not derived from it. This
        object encompasses the entire current XML tree. Each access to XML
        using OODoc::XPath objects is done via XML::XPath. So, after having
        run the following command:

            my $xp = $doc->{'xpath'};

        the experienced programmer will be able to use $xp to access all the
        functionality of the XML::XPath API, bearing in mind that all
        operations using this interface will have a direct effect on the
        content of the $doc object.

        The 'parser' variable (which most will never need) is aimed at
        XML::XPath::XMLParser [16]  type objects which have been used to
        construct OODoc::XPath objects out of an XML document and can be
        reused (within very specific applications) to create new elements
        from imported XML strings. For example:

            $xml =	'<style:properties '	.
            	'draw:luminance="2%" '	.
            $p = $doc->{'parser'}->parse($xml);

        creates a new element which describes the properties applicable to a
        graphics object and which could later be incorporated into a
        document. Calling the parser is not enough to insert the element
        into the document.

        The same parser can be used again later in several XPath objects
        which may be interesting in case you wish to perform simultaneous
        handling of a large number of documents. When calling the
        constructor of a new XPath object, you need simply pass a parser
        parameter equal to the parser parameter of an existing XPath object
        [17] . We have, however, no way of measuring the amount of memory
        and processing time saved by reusing it.

        Bearing in mind that OODoc::XPath does not necessarily handle the
        entire loaded XML document but only the element represented by the
        'element' parameter, there are also two variables, 'begin' and 'end'
        which contain, respectively, those parts of the document before and
        after the extracted element. These two variables are stored in the
        instance to allow reconstruction of the entire document by
        concatenation, however the content of 'begin' and 'end' is not
        verified, analysed or modified.

=head1	NOTES

See OpenOffice::OODoc::Notes(3) for the footnote citations ([n])
included in this page.


Copyright 2005 by Genicorp, S.A. (http://www.genicorp.com)

Initial developer: Jean-Marie Gouarne (http://jean.marie.gouarne.online.fr)

Initial English version of the reference manual
by Graeme A. Hunter (graeme.hunter@zen.co.uk)
Licensing conditions:

	- Genicorp General Public Licence v1.0
	- GNU Lesser General Public License v2.1
Contact: oodoc@genicorp.com