OpenOffice::OODoc::XPath - Low-level XML navigation in the documents
This module is a low-level class which uses OODoc::File (without inheriting anything from it) along with the classes defined in the XML::XPath module. It's a common basis for the other, more user- friendly, document-oriented modules.
This chapter can be skipped by programmers who are only interested in document types handled by the specialist classes which follow. Understanding these classes is easier and using them requires less Perl and XML expertise. However, calling OODoc::XPath remains a good rescue option as it allows all kinds of operations on all types of XML members contained in any OpenOffice.org document.
This class forms the basis of OODoc::Meta, OODoc::Text, OODoc::Styles and OODoc::Image. It contains the lowest layer of navigation services for XML documents and handles the link with OODoc::File for file access. Its primary role is as an interface with the XML::XPath API.
OODoc::XPath is based on the XML::XPath module (see CPAN documentation). In the following chapters, you will see elements often mentioned. When it says that a module expects a parameter or returns an element (either singly or as a list), it is referring to an XML element. More precisely, it is referring to an object of the XML::XPath::Node::Element class (unless otherwise stated) and all available methods this object confers. Generally speaking, it is not necessary to call these low-level methods contained within OODoc::XPath and its descendants using their simpler form. It is however important to distinguish elements from their content (elements being simply references to XML data structures). To read or modify the content of an element such as its text or XML attributes, use the accessors also available within OODoc::XPath.
In most cases where XPath methods require a reference to an element as an argument, there are two ways of proceeding:
- reference the element directly (obtained previously) - or give the XML::XPath path and position, being a string and an integer respectively [2]
Some methods accept both forms which means that if the first parameter is recognised as an element reference, the position does not need to be given. Therefore the number of arguments for certain OODoc::XPath methods can vary.
For those who really want to access all areas there are also OODoc::XPath methods which allow unrestricted access to every element or XML attribute via an access path in XPath syntax. If you are into this kind of thing, we recommend you obtain good syntax reference manuals for XPath and OpenOffice.org and a supply of aspirin.
Methods which may return several lines of text (e.g. getTextList) do so either in the form of an unique character string containing "\n" separators or in table form.
Unless otherwise stated, the word document in this chapter only refers to XML documents contained within OODoc::XPath objects and not OpenOffice.org documents (as an end user would use).
Amongst the different methods which return elements, attributes or text, some are called getXxx, others selectXxx or findXxx. Read methods whose names start with "get" generally refer to an unfiltered object or list, whereas others return an object or list filtered according to a parameter value. In this latter case the search parameter is treated as a standard expression and not an exact value. This means that if the search criteria is "xyz", all text containing "xyz" will be considered a match. To restrict the search to text exactly equal to "xyz", use "^xyz$" as the search criteria (following Perl regular expression syntax).
Several methods allow you to place copies of or references to elements (from other documents or from other positions in the same document) in any position in the current document. This offers powerful manoeuvrability but only if these placements conform with the destination position's context [3] .
For advanced users familiar with the XML::XPath API, it might be interesting to know that all the objects called "elements" in the following chapters are objects of the XML::XPath::Node::Element class, and that all methods associated with this class are directly applicable to them, on top of the functionality described in this manual. However, this should not normally be needed.
Important note: We recommend using OODoc::Meta and OODoc::Document (which are both OODoc::XPath derivatives) to manipulate metadata (for all document types) and content (for text documents) respectively. These two objects provide highest-level methods which are neater and more productive. Explicit use of XPath methods (which sometimes require large numbers of parameters) should only be considered as a last resort in unexpected circumstances for access to any element or XML attribute not handled by more "friendly" methods.
Short Form: ooXPath(<parameters>) Returns a new instance of XPath, containing a well-formed XML document given directly or indirectly as a parameter. Parameters are named (hash key => value). The constructor must get at least one parameter giving a means of obtaining the XML document that it will represent. Three options are available: my $doc = OpenOffice::OODoc::XPath->new(xml => $xml_string); my $doc = OpenOffice::OODoc::XPath->new (archive => $oofile, member => 'meta'); my $doc = OpenOffice::OODoc::XPath->new (file => 'source.sxw', member => 'content'); (Remember you can replace "OpenOffice::OODDoc::XPath" by "ooXPath" in the instructions above, provided that you have loaded the main OpenOffice::OODoc module, that defines this shortcut, and not only and explicitly OpenOffice::OODoc::XPath.) The first method returns an XML string directly (obtained or created previously by the program). The second method links OODoc::XPath to an existing OODoc::File object and indicates which XML member it is to extract (metadata, content, styles, etc). The third method returns an OpenOffice.org file and an member to extract, but this time there is no pre-existing OODoc::File. In this case, the XPath constructor will instance a "private" File object and connect to it for its own use. This third method is generally considered to be the easiest since the use of OODoc::File becomes invisible. It does not, however, allow sharing of the same OODoc::File between several OODoc::XPath. It must therefore only be used if the program accesses one single component of the OpenOffice.org file. A program which must access a spreadsheet's content and page layout simultaneously could do it like this: my $archive = ooFile("invoice.sxc"); my $content = ooXPath(archive => $archive, member => 'content'); my $styles = ooXPath(archive => $archive, member => 'styles'); Caution: being associated with an archive via OODoc::File, none of these OODoc::XPath objects should be deleted before the final save call for this archive. So by calling a save, the File object "calls up" all the XPath objects which were "connected" to it in order to "ask" each of them for the changes which were made to the XML (content, styles, meta, etc.). The results are unpredictable if any of them is absent when called. In short, an application should never delete (undef) OODoc::XPath objects; their number should be kept to an absolute minimum and their lifespan should be the same as that of the program itself. You can pass the optional parameter 'element' in any case where the constructor is called without the 'xml' parameter. Bearing in mind that an OODoc::XPath object will not necessarily handle an entire XML document, this extra parameter indicates the name of the XML element to be loaded and handled. If the 'element' parameter is not given for an OpenOffice.org document, a default element will be chosen according to the following table: 'meta' => 'office:document-meta' 'content' => 'office:document-content' 'styles' => 'office:document-styles' 'settings' => 'office:document-settings' Conversely, the 'element' parameter becomes mandatory if the chosen XML element is not listed above. Through OODoc::File, OODoc::XPath can actually access archives which are not necessarily in OpenOffice.org format and may be, for example, "banks" of presentation and content templates. The parser parameter can be added if the program already has an XML::XPath::XMLParser object [5] . (The same XML parser can be shared between several OODoc::XPath objects.) If this parameter is absent, it automatically becomes the parser if the application has a global variable called $XML_PARSER which effectively points to an XML::XPath::XMLParser. If not, a "private" XML parser is automatically created but it is still possible to share it later using getXMLParser. Therefore if OODoc::XPath is loaded indirectly via the main OpenOffice::OODoc module (which is normal), a global $XML_PARSER is automatically created without the application having to do anything. If the application needs to create a new document, and not process an existing one, an additional option must be passed: create => "class" where "class" must be one of the following list: "text", "spreadsheet", "presentation" or "drawing", according to the needed content class. And, for very special needs, the user can pass an additional "template_path" to select an ad hoc directory of XML templates instead of the default one. This user-provided directory must have the same kind of structure and content as the "templates" subdirectory of the OpenOffice::OODoc installation. A optional 'readable_XML' can be passed. If this option is provided and set to 'on', each XML element created by the application is followed by a line break. Be careful, this option significantly increases the processing time, so it should be set for debugging only Other optional parameters can also be passed to the constructor (see Properties below).
Adds a new element or existing element to the list of child elements of an existing parent element given first (by [path, position] or by reference). The argument after the position argument can be an XML element name. Example: $content->appendElement ( '//office:body', 0, 'text:p', text => "New text" ); adds a paragraph containing the phrase "New text" to the end of the body of the document [6] . If the 'text' option is omitted, an empty element is created (in the above example it would be an empty paragraph or line feed). You can pass the 'attribute' option which is really a hash whose keys are the XML attribute names and whose values are the XML attribute values. Use of these options depends on the type of document and the type of element and requires knowledge of OpenOffice.org conventions. Example: $my_style = { 'style:name' => 'P1', 'style:family => 'paragraph' }; $content->appendElement ( '//office:automatic-styles', 0, 'style:style', attribute => $my_style ); creates a new paragraph style called 'P1' in the list of "automatic styles" [7] . This method lets you add any kind of element into a document, even exotic ones. With the most common OpenOffice.org objects (e.g. paragraphs), though, it is easier to use the specialist methods contained in other modules. The 'name' argument can be replaced by an existing element in the same OODoc::XPath object or in another. In which case no element is created but the existing element is simply referenced with a new position even though it remains in its old position. Caution: any modification of an element which is referenced several times in one or more documents is made to all references. If you want to add a similar but separate element, you must use replicateElement which produces a new element from the content of an existing one. The 'name' argument can also be replaced by an XML string. This string must correspond to the correct XML description of a UTF-8 encoded [8] OpenOffice.org element. For example, it could be a string which had been previously exported using the exportXMLElement method of OODoc::XPath, or extracted from an OpenOffice.org file by some other application [9] . The following piece of code produces the same result as the first example: $xml = '<text:p text:style-name="Standard">' . 'New text' . '</text:p>'; $content->appendElement ( '//office:body', 0, $xml ); Using this method, after one or more element creations by direct importation of XML strings, it might be useful to call the reorganize method (but not absolutely necessary).
Cancels the entire document contents of the current instance and replaces it with a reference to the contents of another OODoc::XPath object. Example: $doc1 = OpenOffice::OODoc::XPath->new ( file => 'template.sxc', member => 'styles' ); $doc2 = OpenOffice::OODoc::XPath->new ( file => 'sheet.sxc', member => 'styles' ); $doc2->cloneContent($doc1); $doc2->save; This sequence replaces the styles and page layout of 'sheet.sxc' with those of 'template.sxc'. The above example could easily have been written without even using OODoc::XPath by acting directly on the files. For example, extract the 'styles.xml' member from 'template.sxc' and insert it into 'sheet.sxc'. The use of OODoc::XPath and the cloneContent method guarantees that the transferred content corresponds to an OpenOffice.org document and allows reads/writes to it on the fly. Caution: the "cloned" content is not physically copied. Calling this method references one single physical content in two documents. Any modifications made to the content of either of these two documents applies equally to the other and vice-versa.
Accessor to get or set the class of the document content. If the current member is a document content, returns its class according to the OpenOffice.org terminology, i.e. one of the following values: "text", "spreadsheet", "presentation", or "drawing". Returns an empty string if the current member is not a document content (if it's, for example, the "meta" or "styles" member). With a given class name as an optional argument, this method can change or set the document class; but this risky operation is, of course, for advanced users knowing exactly that they do. Changing the content class allows, for example, to turn a text containing some tables in a spreadsheet. But some things must be changes elsewhere in order to produce a consistent OpenOffice.org document. For example, the MIME type of the document, stored in its manifest, must be changed accordingly (see OpenOffice::OODoc::Manifest). And, of course, the real content must be consistent with the declared content class.
Creates a new element without attributes which is not inserted in a document. Example: my $element = $doc->createElement ('my_element', 'its content'); creates a new XML element without attributes and returns its reference. Instead of a name, the first argument can be the full XML description of the element. Example: my $element = $doc->createElement ('<text:p>My text</text:p>'); This new element is temporary: it is not linked to any document. It is destined to be used later by another method. The name can contain a namespace prefix which would look like this: 'namespace:name'. In its second form, a well-formed XML string can be supplied as a single argument. The recognition criteria is the presence of the "<" character at the beginning of the argument. See appendElement for comments on the direct insertion of XML. Explicit calls to createElement should be rare. This method is normally called silently by higher-level methods which are capable of creating an element, inserting it in a document's XML tree and giving it attributes (see appendElement and insertElement).
Caution: this method is a non-exported class method. It must be used like this: OpenOffice::OODoc::XPath::decode_text($utf8_string); and not from an OODoc::XPath instance. Decodes a UTF8 [10] string and returns an 8 bit character [11] translation of it out of the user's character set, as defined by the following variable: $OpenOffice::OODoc::XPath::LOCAL_CHARSET for which the default value is 'ISO-8859-1' [12] . Explicit calls to this method should be rare. It is used internally by methods which return text extracted from document content (e.g. getText). Warning to contributors: any method which returns text extracted from OpenOffice.org documents is based on decode_text; so any modification or improvement of the decoding logic should be made there.
Class method. Encodes "local" character strings (for writing to OpenOffice.org documents). Example: $string = OpenOffice::OODoc::encode_text($local_string); The local character string is defined by the following global variable: $OpenOffice::OODoc::XPath::LOCAL_CHARSET for which the default value is 'ISO-8859-1'. Explicit calls to this method should generally be avoided. It is used internally by methods which insert text or attribute values into documents (e.g. setText).
Returns the XML string for use by another application representing the body of a document, without UTF8 decoding.
Returns the XML string which represents a particular document element (style definition, paragraph, table cell, object, etc.) for use by another application without UTF8 decoding. This method is principally designed to allow remote exchanges of elements between programs using any XML storage or transfer method. It acts as "sender" whilst the "receiver" can use appendElement or insertElement (for example) to insert any exported elements into a document. Example: # sender programme # ... open (EXPORT, "> transfer.xml"); print EXPORT $doc->exportXMLElement('//text:p', 15); close EXPORT; # receiver programme # ... open (IMPORT, "< transfer.xml"); $doc->appendElement('//office:body', 0, <IMPORT>); close (IMPORT); In this example, a paragraph is transferred but it could just as easily be any content, presentation or metadata element. Conversely, this method is not needed when transferring an element from one document to another in the same program (or from one document position to another). An element can be copied directly from within the same program by reference or replication without going via its XML (see appendElement, insertElement and replicateElement).
Appends the given text to the previous content of the given element. Example: $doc->setText($p, "Initial content"); $doc->extendText($p, " extended"); Assuming $p is a regular text element (ex: a paragraph), its content becomes "Initial content extended". (See also setText()).
Returns a list of child elements (from a document's tree) of the element given as an argument whose content agrees with the 'filter' parameter. The filter can be an exact string match or a regular expression. If the filter is omitted or contains a wildcard expression like '.*', the returned list will contain all child elements without condition. If the third argument ('replacement') is given, every string which matches the filter in each child element will be replaced by this 'replacement' value. This 'replacement' argument can be a character string or a function reference. (See replaceText method below.) Filtering and possible replacement only affects an element's content and not its attributes [13] . This method is mostly for internal use. We recommend using other methods for the selective extraction of elements.
Returns the 'name' value of the chosen element (or undef if name is not defined or if the element does not exist). Example: my $style = $doc->getAttribute('//text:p', 15, 'text:style-name'); returns the style for paragraph 15.
Returns a list of the element's attributes in the form of a hash whose keys are the attributes' XML names.
Returns an element's reference from an XPath path and a position (or undef if the given path does not indicate an existing element). Position indicators start at 0 just like in Perl tables (and other programming languages). Example: my $p = $doc->getElement('//table:table', 0) indicates an element containing the first table of a text document or first sheet of a spreadsheet. Positions can also be counted backwards from the end by giving negative values, i.e. position -1 being the last element. Thus: my $h = $doc->getElement('//text:h', -2); indicates the second-last header of a text document. Caution: the position indicators used here are not the same in XPath. In XPath indicators start at 1 and negative values are not allowed. So, the first element "text:p" would be shown as "//text:p[1]" if using the getNodeByXPath method (see below), whereas if using getElement it would be at position 0. An XPath expression such as "//text:p[-1]" would return nothing. When successful, this method ensures that the returned object is indeed an element and not another type of node (e.g. attribute, text, comment, etc.)
Returns a list of all elements at a specified path. Example: my @ref_summary = $doc->getElementList('//text:h'); The above example returns a table containing all header elements of a text document. The path can of course be a more complex XPath expression stipulating, for example, a selection of attribute values. In most cases, you should avoid complicating things unnecessarily (especially in Text, Image and Styles modules), as there are methods for searching by element type, attribute and content which are much easier to use and avoid the need to supply XPath expressions. Note: the returned list contains elements in the sense of getElement and not a list of element contents.
A low-level method which returns the node corresponding to the given XPath expression, if it exists in the document. This method (which gives unrestricted access to the entire content of a document) is designed for use with the unexpected. You will obviously need to be familiar with XPath syntax (not documented here) as well as OpenOffice.org document structure. See also selectNodesByXPath.
Returns text in the local character set, possibly UTF-8 decoded, contained in the element given as an argument (by path/position or by reference). Two equivalent examples: # version 1 my $element = $doc->getElement('//text:p', 4); my $text = $doc->getText($element); # version 2 my $text = $doc->getText('//text:p', 4); Version 2 is better if the only aim is to get the text from paragraph 4. Version 1 is better, however, if during the course of the program you want to perform other operations on the same paragraph. Giving an element's reference will mean avoiding element handling methods having to recalculate a reference from the XPath path.
Returns text from all elements in the specified path. Example: my $summary = $doc->getTextList('//text:h'); my $report = $doc->getTextList('//text:span'); The $summary variable contains a concatenation of all headers. $report contains all the words or character strings that "stand out" which the user has designated by their context, e.g. words in italics in a non-italic paragraph. In a list context, the returned data is a table, each of whose elements contains the text of an XML element. In a scalar context (as in our two examples), the returned value is a unique piece of editable text and each element's content is separated from that of the following element by a line feed.
Returns a document's entire XML content.
Technical method which returns the reference of the XML parser being used by the current OODoc::XPath object, for later re-use elsewhere. The returned object is of type XML::XPath::XMLParser.
A low-level method which allows direct access to the value corresponding to the given XPath expression in a document. Character decoding is handled in the same way as with getText. Example: $expression = '//office:automatic-styles' . '/style:style' . '[@style:style-name="P1"]' . '/@style:parent-style-name'; print $doc->getXPathValue($expression); This sequence displays the name of the parent style of automatic style "P1" (if it exists within the document). Remember that more simple methods in Text and/or Styles modules would indeed produce the same result. The optional element reference "context" can be given as an argument either in first or second place. In this case, the search is limited to the section of the document tree below this given element. The default search area is the entire document. Just as with other methods which require XPath paths, this one is primarily for internal use. It should not be used by the majority of applications.
Inserts a new element before or after the element specified by [path, position] or by reference. If the "name" argument is a literal, a new element with the name given is created and then inserted. If the same argument is a reference to an existing element, this element is then simply inserted at the position indicated. This method is useful either for adding new elements or for copying elements from one document to another or from one position to another within the same document. Options are passed as [name => value] i.e.: position => before or position => after allowing you to choose if the insertion should be done before or after the given element. The default position is before. text => "text of element" attribute => $attributes The "attribute" option is itself a hash reference containing one or more attributes in the form [name => value] as in appendElement. When successful, this method returns the inserted element's reference (else undef). Example: my $attributes = { 'text:style-name' => 'Header 2', 'text:level' => '2' }; $doc->insertElement ( '//text:p', 4, 'text:h', position => 'after', text => 'New section', attribute => $attributes ); This sequence (in an OpenOffice.org Writer document) inserts a level 2 header 'New section' immediately after paragraph 4. The $name argument can be replaced by an existing element. In this case a new reference to the existing element is inserted, without creating a whole new element. In this way you can display an element at several locations or in several documents which is held in memory only once. See the appendElement section for the consequences of having multiple references to the same physical element. Better to use replicateElement to insert separate copies of an element. In the same conditions as in appendElement, the 'name' argument can be replaced by an XML string which describes the element [14] . Note: to add an element to the end of a document, it would obviously be better to use appendElement.
Low-level method allowing the creation or direct modification without restriction (almost) of any document element. It allows "query" expressions in a language similar to XPath. If the given XPath expression crosses several levels of hierarchy, intermediate nodes can be created or modified "on the fly" by creating the necessary path which in turn creates the final node. Example: $doc->makeXPath ( '//office:body/text:p[4 @text:style-name="Text body"]' ); This "query" applies the "Text body" style to paragraph 4 in the body of the document. (In reality you will probably never use it because the setStyle method of the Text module would do the same thing much more simply.) If, as in the above example, a node is accompanied by a position indicator, it cannot be created but must simply act as a mandatory "passage". This method cannot therefore be used to create, for example, an Nth paragraph if there is already an N-1. The only restrictions apply to namespaces which are given as prefixes to element and attribute names. They must be defined in the document i.e. conform to OpenOffice.org specifications. For the rest, this method allows the creation of almost anything anywhere within a document. Its use is reserved for OpenOffice XML specialists. In its second form, a context node can be given as the first argument. If present, the path is sought (and if necessary created) starting from its position. By default, the path begins from the root. The returned value is the final node's reference (found or created). The full "query language" syntax used in this method is not documented here. makeXPath is designed to act more as a base for other OpenOffice::OODoc methods than to be used in applications.
Physically imports an external file into an OpenOffice.org archive associated with an XPath object, if it exists i.e. if the object was created using file or archive parameters. This method only transmits the command to the OODoc::File's raw_import method. Caution: it must not be used with an "active" element i.e. an XML member to which the current XPath object or another XPath object is already associated. Remember too that the import is not actually carried out by OODoc::File until a save and the imported data is therefore not immediately available.
Physically exports a member from an OpenOffice.org archive associated with an XPath object, if it exists i.e. if the object was created using file or archive parameters. This method only transmits the command to the OODoc::File's raw_import method.
Deletes the "attribute" attribute (if found) of the given element by [path, position] or by reference and returns "true". Has no physical effect and returns undef if the attribute has not been defined or if the element does not exist.
Deletes the given element (if found) by [path, position] or by reference and returns "true". Returns undef if the element does not exist.
Technical method for maintaining the structure of the current document, for use only in exceptional circumstances where the application's operations risk destabilising the internal addressing of elements. This will primarily happen when inserting new XML elements in the form of XML strings after the document has been loaded i.e. when the XML parser is again launched to include an "addition" to an already parsed document. Being costly in runtime, this method must not be called immediately after each XML import or other address destabilising operation. A single reorganize after each series of destabilising operations is enough and even then perhaps only before you need to access an element by [path, position]. Address destabilising operations are not an issue if all elements are selected by reference, attribute or content filter. Moreover it is absolutely unnecessary to call reorganize just before calling a save.
Deletes the given element by [path, position] or by reference and inserts another element in its place, either from another location in the same document or from another document. A new element can be supplied under the same conditions as for insertElement. By default or by using the mode => 'copy' option, it is a copy of the new element which is inserted. With the mode => 'reference' option, it is only a reference which is inserted. See the section on appendElement for comments on the subject of multiple references to a single physical element.
Replaces all sub-strings which match "filter" with "replacement" in the text of an element indicated by [path, position] or by reference and returns the modified text. The "filter" string can be an "exact" literal or a regular expression. Example: $doc->replaceText($p, "C(LIENT|USTOMER)", $contact); replaces each occurrence of "CLIENT" and "CUSTOMER" with the content of the $contact variable in the paragraph $p of document $doc. The "replacement" argument can be a function reference. In which case, the function is called each time the string is matched, and the value returned by the function is used as the replacement value. sub action { my $arg = shift; my $text = shift; print "$arg : $text\n"; return "OK"; } $doc->replaceText($p, $expression, \&action, "Found"); displays "Found: <text>" (where <text> is the text retrieved) each time a string matches $expression and replaces this string with "OK". If $expression contains an "exact" string, then clearly the text displayed will always be the same string. However, if it happens to be a regular expression, it is in effect the text retrieved which will be displayed. Generally speaking, if the replacement value is a function reference, the called function receives the remainder of the arguments which follow it, in this order:
Makes a copy of the given element and inserts it into the current document according to 'position' and, where indicated, according to a hash of options. If 'position' is another existing element then the new element is inserted after the children of the existing element, except where either pairs position=>'after' or position=>'before' are specified in the list of options. In this case, the insertion is made at the same hierarchical level as the positional element according to the same logic as for insertElement [15] . If the 'position' argument is given as 'end', then the new element is added at the last child position of the root element. If the 'position' argument is given as 'body', then the new element is added at the end of the list of child elements of the element which corresponds to the getBody value (requires an OODoc::XPath type object, by default 'office:body'). If the 'position' argument is an existing element, then the new element is inserted immediately before the given element by default. If the pair position=>'after' are in the options list, the element is inserted immediately after, as with insertElement. Example: my $template = $doc_source->selectElementByAttribute ( '//style::style', 'style:name', 'Body of text' ); my $position = $doc_target->getElement ('//office:styles', 0); $doc_target->replicateElement($template, $position); This sequence adds a style 'Body of text' to the styles collection of $doc_target which copies exactly the style of the same name in $doc_source. Obviously, the section of code dealing with the search for the element to copy and its position is the most laborious. This method physically creates a new element which is an exact copy of the given element, but which is physically separate from it. This method is much slower than simply modifying an existing element or inserting an element reference and heavy use is not recommended.
Calls the 'save' method of an OODoc::File object to which the current object is connected, passing the filename argument to it (if provided). Only works if an OODoc::File object is indeed connected (this generally means that the current OODoc::XPath object was created with the constructor parameter 'file'). If not, an error is produced.
Returns the first (or only) element whose name matches "filter" from within the child elements of the given element indicated by [path, position] or by reference. "filter" is taken to be a regular expression. If several values match the filter, the first of these is returned (in the XML's physical order which is not necessarily the logical order of the document). See the comments about selectElementByAttribute if wanting to select an exact name. Returns undef if no elements match the condition. Returns the first (or only) child (if there are more than one) without anything else if no filter is given or if the filter uses wildcards (".*").
Like selectChildElementByName, but returns a list of all elements which match the condition. Example: my @search_words = $doc->selectChildElementsByName ('//text:p', 4, 'text:span'); returns a list of elements from paragraph 4 which correspond to text which has particular attributes which distinguish it from the rest of the paragraph (colour, font, etc.)
Returns a list of elements corresponding to a given XPath path and whose text matches the filter (regular expression). The "context" argument, if given, is an element reference which limits the search to its own child elements. The search is carried out in the entire document by default. An element is selected if the search string is found in its own text or in the text of any element descended from it. E.g. An image element (draw:image) can be selected from the value of its attached "description" field. You can replace all strings matching the search criteria with the 'replacement' string, on the fly, if the latter is given as an argument after the filter. Lastly, instead of a replacement string, you can pass a subroutine's reference which will run (in call back mode) each time the search string is matched. If this subroutine returns a defined value, this value is used as the replacement string. The subroutine will automatically receive the rest of the arguments, in this order: If, as is generally the case, you are working exclusively with text elements (paragraphs, headers, etc.), you would be better to use selectElementsByContent of the Text module which is easier to use and does not require an XPath expression. Here is an example which returns the list of images whose descriptors contain the word "landscape" and displays the name of each selected image: sub printMessage { my $doc = shift; my $element = shift; my $image = $element->parentNode; print "Name: " . $image->find('@draw:name') . "\n"; } my @list = $doc->selectElements ( '//draw:image/svg:desc', 'landscape', \&printMessage, $doc ); Never use this example of code in a real application as it is both purely for demonstration and unnecessarily complex. You can perform the same operation much more simply using the OODoc::Image module.
In a list context, returns a list of elements at the given path with the given attribute which contain a value matching the filter's regular expression. In a scalar context, returns the first (or only) element which matches the same condition. Returns undef if no elements match the condition. Example: my @paragraph_styles = $doc->selectElementsByAttribute ('style:style', 'style:family', 'paragraph'); returns the list of elements which describe the paragraph styles of document $doc. Caution: the filter is treated as a regular expression and not as a classic string. This means that the above piece of code might not only return the elements whose "style:family" attribute equals "paragraph", but also all those in which the same attribute contains the word "paragraph". You must therefore use the appropriate syntax (in regex language) if you want to select an exact value, which in this case would be "^paragraph$".
Like selectElementsByAttribute in a scalar context. Returns the first (or only) element at the given path which has the given attribute containing the given value. Returns undef if no elements match the condition.
This low-level method returns a list of nodes (which are not necessarily elements) which match the give XPath expression. See getNodeByXPath for options and comments.
Modifies or adds one or more attributes to an element. The element is indicated by reference or by [path, position]. The list of attributes is given in the form of a hash name => value. Example: my $h = $doc->getElement('//text:h', 12); my %attributes = ( 'text:style-name' => 'My Header', 'text:level => '3' ); $doc->setAttributes($h, %attributes); This sequence gives the 'My Header' style and level 3 to the 13th "header" element in the document.
Use the given text as the content of the given element. Any previous content is replaced by the given one. (See also extendText())
No class variables are exported; the applications, if needed, must access them using their full name ($OpenOffice::OODoc::XPath:XXX) The following names should be prefixed explicitly with "$OpenOffice::OODoc::XPath::" CHARS_TO_ESCAPE contains the list of reserved characters which, in XML, should be replaced by escape sequences. OO_CHARSET indicates the character set used for OpenOffice.org document encoding and whose default value is 'utf8' (it should not be changed). LOCAL_CHARSET indicates the user's character set, by default 'iso-8859-1'; it must be changed according to the real user's needs (warning: there is no kind of automatic adaptation to the user's locales, so the application must explicitly load the right value in this variable); it should be done using the localEncoding() accessor (see the OpenOffice::OODoc(3) man page and, for the list of supported character sets, the Encode module's documentation). The content of these three variables should not normally be directly modified by the applications. Instance hash variables are : 'archive' => <oodoc_file_object> 'file' => <OpenOffice.org file> 'member' => <file member> 'readable_XML' => <'on' or not> 'xml' => <XML string> 'element' => <name of loaded XML element> 'xpath' => <XML::XPath object> 'parser' => <XML parser> However, the 'xml' variable is cleared almost immediately after a successful constructor call, in order to save memory. As soon as the corresponding XPath object has been created, the XML source is no longer required. The 'xpath' variable of an OODoc::XPath object contains a reference to an XML::XPath class object (see CPAN documentation). Remember that OODoc::XPath is based on XML::XPath, not derived from it. This object encompasses the entire current XML tree. Each access to XML using OODoc::XPath objects is done via XML::XPath. So, after having run the following command: my $xp = $doc->{'xpath'}; the experienced programmer will be able to use $xp to access all the functionality of the XML::XPath API, bearing in mind that all operations using this interface will have a direct effect on the content of the $doc object. The 'parser' variable (which most will never need) is aimed at XML::XPath::XMLParser [16] type objects which have been used to construct OODoc::XPath objects out of an XML document and can be reused (within very specific applications) to create new elements from imported XML strings. For example: $xml = '<style:properties ' . 'draw:luminance="2%" ' . 'draw:contrast="3%"/>'; $p = $doc->{'parser'}->parse($xml); creates a new element which describes the properties applicable to a graphics object and which could later be incorporated into a document. Calling the parser is not enough to insert the element into the document. The same parser can be used again later in several XPath objects which may be interesting in case you wish to perform simultaneous handling of a large number of documents. When calling the constructor of a new XPath object, you need simply pass a parser parameter equal to the parser parameter of an existing XPath object [17] . We have, however, no way of measuring the amount of memory and processing time saved by reusing it. Bearing in mind that OODoc::XPath does not necessarily handle the entire loaded XML document but only the element represented by the 'element' parameter, there are also two variables, 'begin' and 'end' which contain, respectively, those parts of the document before and after the extracted element. These two variables are stored in the instance to allow reconstruction of the entire document by concatenation, however the content of 'begin' and 'end' is not verified, analysed or modified.
See OpenOffice::OODoc::Notes(3) for the footnote citations ([n]) included in this page.
Copyright 2005 by Genicorp, S.A. (http://www.genicorp.com)
Initial developer: Jean-Marie Gouarne (http://jean.marie.gouarne.online.fr)
Initial English version of the reference manual by Graeme A. Hunter (graeme.hunter@zen.co.uk)
Licensing conditions:
- Genicorp General Public Licence v1.0 - GNU Lesser General Public License v2.1
Contact: oodoc@genicorp.com
To install OpenOffice::OODoc, copy and paste the appropriate command in to your terminal.
cpanm
cpanm OpenOffice::OODoc
CPAN shell
perl -MCPAN -e shell install OpenOffice::OODoc
For more information on module installation, please visit the detailed CPAN module installation guide.