7669 lines
324 KiB
XML
7669 lines
324 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!-- -*- nxml -*- -->
|
|
<book lang="en-us">
|
|
<title>XML::LibXML</title>
|
|
|
|
<bookinfo>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>Matt</firstname>
|
|
<surname>Sergeant</surname>
|
|
</author>
|
|
|
|
<author>
|
|
<firstname>Christian</firstname>
|
|
<surname>Glahn</surname>
|
|
</author>
|
|
|
|
<author>
|
|
<firstname>Petr</firstname>
|
|
<surname>Pajas</surname>
|
|
</author>
|
|
</authorgroup>
|
|
|
|
|
|
<edition>2.0207</edition>
|
|
<copyright>
|
|
<year>2001-2007</year>
|
|
<holder>AxKit.com Ltd</holder>
|
|
</copyright>
|
|
<copyright>
|
|
<year>2002-2006</year>
|
|
<holder>Christian Glahn</holder>
|
|
</copyright>
|
|
<copyright>
|
|
<year>2006-2009</year>
|
|
<holder>Petr Pajas</holder>
|
|
</copyright>
|
|
</bookinfo>
|
|
|
|
<chapter id="README">
|
|
<title>Introduction</title>
|
|
|
|
<titleabbrev>README</titleabbrev>
|
|
|
|
<para>This module implements a Perl interface to the Gnome
|
|
libxml2 library which provides
|
|
interfaces for parsing and manipulating XML files. This
|
|
module allows Perl programmers to make use of its highly
|
|
capable validating XML parser and its high performance DOM
|
|
implementation.</para>
|
|
|
|
<sect1>
|
|
<title>Important Notes</title>
|
|
|
|
<para>XML::LibXML was almost entirely reimplemented between version 1.40 to version 1.49. This may cause problems on some production machines. With
|
|
version 1.50 a lot of compatibility fixes were applied, so programs written for XML::LibXML 1.40 or less should run with version 1.50 again.</para>
|
|
<para>In 1.59, a new callback API was introduced. This new API is not compatible with the previous one.
|
|
See XML::LibXML::InputCallback manual page for details.</para>
|
|
<para>In 1.61 the XML::LibXML::XPathContext module, previously distributed separately, was merged in.</para>
|
|
<para>An experimental support for Perl threads introduced in 1.66 has been replaced in 1.67.</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Dependencies</title>
|
|
|
|
<para>Prior to installation you MUST have installed the libxml2 library. You can get the latest libxml2 version from</para>
|
|
|
|
<para>http://xmlsoft.org/</para>
|
|
|
|
<para>Without libxml2 installed this module will neither build nor run.</para>
|
|
|
|
<para>Also XML::LibXML requires the following packages:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>XML::SAX - base class for SAX parsers</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>XML::NamespaceSupport - namespace support for SAX parsers</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>These packages are required. If one is missing some tests will fail.</para>
|
|
|
|
<para>Again, libxml2 is required to make XML::LibXML work. The library is not just required to build XML::LibXML, it has to be accessible during
|
|
run-time as well. Because of this you need to make sure libxml2 is installed properly. To test this, run the xmllint program on your system. xmllint
|
|
is shipped with libxml2 and therefore should be available.
|
|
For building the module you will also need the header file for libxml2, which in binary
|
|
(.rpm,.deb) etc. distributions usually dwell in a package named libxml2-devel or similar.</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Installation</title>
|
|
<para>(These instructions are for UNIX and GNU/Linux systems. For MSWin32,
|
|
See Notes for Microsoft Windows below.)</para>
|
|
<para>To install XML::LibXML just follow the standard installation routine for Perl modules:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>perl Makefile.PL</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>make</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>make test</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>make install # as superuser</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<para>Note that XML::LibXML is an XS based Perl extension and you need a C compiler
|
|
to build it.</para>
|
|
<para>Note also that you should rebuild XML::LibXML if you upgrade libxml2
|
|
in order to avoid problems with possible binary incompatibilities between releases of the library.</para>
|
|
|
|
<sect2>
|
|
<title>Notes on libxml2 versions</title>
|
|
|
|
<para>XML::LibXML requires at least
|
|
libxml2 2.6.16 to compile and pass all tests and
|
|
at least 2.6.21 is required for XML::LibXML::Reader.
|
|
For some older OS versions this means that an
|
|
update of the pre-built packages is required.</para>
|
|
|
|
<para>Although libxml2 claims binary compatibility between
|
|
its patch levels, it is a good idea to recompile XML::LibXML
|
|
and run its tests after an upgrade of libxml2.
|
|
</para>
|
|
|
|
<para>If your libxml2 installation is not within your $PATH,
|
|
you can pass the XMLPREFIX=$YOURLIBXMLPREFIX parameter to Makefile.PL
|
|
determining the correct libxml2 version in use. e.g.
|
|
</para>
|
|
|
|
<programlisting> perl Makefile.PL XMLPREFIX=/usr/brand-new </programlisting>
|
|
|
|
<para>will ask '/usr/brand-new/bin/xml2-config' about your real libxml2 configuration.</para>
|
|
|
|
<para>Try to avoid setting INC and LIBS directly on the
|
|
command-line, for if used, Makefile.PL does not check
|
|
the libxml2 version for compatibility with XML::LibXML.</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Which version of libxml2 should be used?</title>
|
|
|
|
<para>XML::LibXML is tested against a couple versions of
|
|
libxml2 before it is released. Thus there are versions
|
|
of libxml2 that are known not to work properly with
|
|
XML::LibXML. The Makefile.PL keeps a blacklist of
|
|
the incompatible libxml2 versions using Alien::Libxml2.
|
|
The blacklist itself is kept inside its "alienfile"
|
|
file.</para>
|
|
|
|
<para>If Makefile.PL detects one of the incompatible versions,
|
|
it notifies the user. It may still happen that
|
|
XML::LibXML builds and pass its tests with such
|
|
a version, but that does not mean everything
|
|
is OK. There will be no support at all for blacklisted versions!</para>
|
|
|
|
<para>As of XML::LibXML 1.61, only versions 2.6.16 and higher are supported.
|
|
XML::LibXML will probably not compile with earlier libxml2 versions than
|
|
2.5.6. Versions prior to 2.6.8 are known to be broken for various reasons,
|
|
versions prior to 2.1.16 exhibit problems with namespaced attributes
|
|
and do not therefore pass XML::LibXML regression tests.
|
|
</para>
|
|
|
|
<para>It may happen that an unsupported version of libxml2
|
|
passes all tests under certain conditions. This is no
|
|
reason to assume that it shall work without problems.
|
|
If Makefile.PL marks a version of libxml2 as incompatible or broken
|
|
it is done for a good reason.</para>
|
|
|
|
<para>Full linking information for libxml2 can be obtained
|
|
by invoking "xml2-config --libs".</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Notes for Microsoft Windows</title>
|
|
|
|
<para>Thanks to Randy Kobes there is a pre-compiled PPM package available on</para>
|
|
<para>http://theoryx5.uwinnipeg.ca/ppmpackages/</para>
|
|
|
|
<para>Usually it takes a little time to build the package for the latest release.</para>
|
|
<para>If you want to build XML::LibXML on Windows from source, you can use
|
|
the following instructions contributed by Christopher J. Madsen:</para>
|
|
|
|
<para>These instructions assume that you already have your system set up to
|
|
compile modules that use C components.
|
|
</para>
|
|
<para>
|
|
First, get the libxml2 binaries from http://xmlsoft.org/sources/win32/
|
|
(currently also available at http://www.zlatkovic.com/pub/libxml/).
|
|
</para>
|
|
<para>
|
|
You need:
|
|
</para>
|
|
<programlisting> iconv-VERSION.win32.zip
|
|
libxml2-VERSION.win32.zip
|
|
zlib-VERSION.win32.zip</programlisting>
|
|
<para>Download the latest version of each. (Each package will probably have
|
|
a different version.) When you extract them, you'll get directories
|
|
named iconv-VERSION.win32, libxml2-VERSION.win32, and
|
|
zlib-VERSION.win32, each containing bin, lib, and include directories.</para>
|
|
<para>Combine all the bin, include, and lib directories under c:\Prog\LibXML.
|
|
(You can use any directory you prefer; just adjust the instructions
|
|
accordingly.)</para>
|
|
<para>Get the latest version of XML-LibXML from CPAN.
|
|
Extract it.</para>
|
|
<para>Issue these commands in the XML-LibXML-VERSION directory:</para>
|
|
<programlisting> perl Makefile.PL INC=-Ic:\Prog\LibXML\include LIBS=-Lc:\Prog\LibXML\lib
|
|
nmake
|
|
copy c:\Prog\LibXML\bin\*.dll blib\arch\auto\XML\LibXML
|
|
nmake test
|
|
nmake install</programlisting>
|
|
<para>(Note: Some systems use dmake instead of nmake.)</para>
|
|
<para>By copying the libxml2 DLLs to the arch directory, you help avoid
|
|
conflicts with other programs you may have installed that use other
|
|
(possibly incompatible) versions of those DLLs.</para>
|
|
</sect2>
|
|
<sect2>
|
|
<title>Notes for Mac OS X</title>
|
|
|
|
<para>
|
|
Due to a refactoring of the module, XML::LibXML will
|
|
not run with some earlier versions of Mac OS X. It
|
|
appears that this is related to special linker options
|
|
for that OS prior to version 10.2.2. Since the
|
|
developers do not have full access to this OS, help/
|
|
patches from OS X gurus are highly appreciated.
|
|
</para>
|
|
|
|
<para>It is confirmed that XML::LibXML builds and runs
|
|
without problems since Mac OS X 10.2.6.</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Notes for HPUX</title>
|
|
|
|
<para>XML::LibXML requires libxml2 2.6.16 or
|
|
later. There may not exist a usable binary
|
|
libxml2 package for HPUX and XML::LibXML. If
|
|
HPUX cc does not compile libxml2
|
|
correctly, you will be forced to recompile perl with
|
|
gcc (unless you have already done that).</para>
|
|
|
|
<para>Additionally I received the following Note from Rozi Kovesdi:</para>
|
|
|
|
<programlisting>Here is my report if someone else runs into the same problem:
|
|
|
|
Finally I am done with installing all the libraries and XML Perl
|
|
modules
|
|
|
|
The combination that worked best for me was:
|
|
gcc
|
|
GNU make
|
|
|
|
Most importantly - before trying to install Perl modules that depend on
|
|
libxml2:
|
|
|
|
must set SHLIB_PATH to include the path to libxml2 shared library
|
|
|
|
assuming that you used the default:
|
|
|
|
export SHLIB=/usr/local/lib
|
|
|
|
also, make sure that the config files have execute permission:
|
|
|
|
/usr/local/bin/xml2-config
|
|
/usr/local/bin/xslt-config
|
|
|
|
they did not have +x after they were installed by 'make install'
|
|
and it took me a while to realize that this was my problem
|
|
|
|
or one can use:
|
|
|
|
perl Makefile.PL LIBS='-L/path/to/lib' INC='-I/path/to/include'</programlisting>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Contact</title>
|
|
|
|
<para>For bug reports, please use the issue tracker at
|
|
https://github.com/shlomif/perl-XML-LibXML/issues .</para>
|
|
|
|
<para>
|
|
For suggestions etc. you may contact the maintainer directly at
|
|
https://www.shlomifish.org/me/contact-me/
|
|
, but in general, it is recommended to use the mailing
|
|
list given below.
|
|
</para>
|
|
|
|
<para>For suggestions etc., and other issues
|
|
related to XML::LibXML you may use the perl XML mailing list
|
|
(<email>perl-xml@listserv.ActiveState.com</email>),
|
|
where most XML-related Perl modules are discussed.
|
|
In case of problems you should check the archives of that
|
|
list first. Many problems are already discussed there. You
|
|
can find the list's archives and subscription options at
|
|
http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Package History</title>
|
|
|
|
<para>Version < 0.98 were maintained by Matt Sergeant</para>
|
|
|
|
<para>0.98 > Version > 1.49 were maintained by Matt Sergeant and Christian Glahn</para>
|
|
|
|
<para>Versions >= 1.49 are maintained by Christian Glahn</para>
|
|
|
|
<para>Versions > 1.56 are co-maintained by Petr Pajas</para>
|
|
|
|
<para>Versions >= 1.59 are provisionally maintained by Petr Pajas</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Patches and Developer Version</title>
|
|
|
|
<para>As XML::LibXML is open source software, help and
|
|
patches are appreciated. If you find a bug in the current
|
|
release, make sure this bug still exists in the developer
|
|
version of XML::LibXML. This version can be cloned
|
|
from its Git repository. For more information about that,
|
|
see:</para>
|
|
|
|
<para>https://github.com/shlomif/perl-XML-LibXML</para>
|
|
|
|
<para>Please consider all regression tests as correct. If
|
|
any test fails it is most certainly related to a
|
|
bug.</para>
|
|
|
|
<para>If you find documentation bugs, please fix them in
|
|
the libxml.dbk file, stored in the docs directory.</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Known Issues</title>
|
|
|
|
<para>The push-parser implementation causes memory leaks.</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="LICENSE">
|
|
<title>License</title>
|
|
|
|
<titleabbrev>LICENSE</titleabbrev>
|
|
|
|
<para>This is free software, you may use it and distribute it under the same terms as Perl itself.</para>
|
|
|
|
<para>Copyright 2001-2003 AxKit.com Ltd., 2002-2006 Christian Glahn, 2006-2009 Petr Pajas</para>
|
|
|
|
<sect1>
|
|
<title>Disclaimer</title>
|
|
|
|
<para>THIS PROGRAM IS DISTRIBUTED IN THE HOPE THAT IT WILL
|
|
BE USEFUL, BUT WITHOUT ANY WARRANTY; WITHOUT EVEN THE
|
|
IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A
|
|
PARTICULAR PURPOSE.</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML">
|
|
<title>Perl Binding for libxml2</title>
|
|
|
|
<titleabbrev>XML::LibXML</titleabbrev>
|
|
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
|
|
<programlisting>use XML::LibXML;
|
|
my $dom = XML::LibXML->load_xml(string => <<'EOT');
|
|
<some-xml/>
|
|
EOT</programlisting>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>This module is an interface to libxml2, providing
|
|
XML and HTML parsers with DOM, SAX and XMLReader interfaces,
|
|
a large subset of DOM Layer 3 interface and
|
|
a XML::XPath-like interface to XPath API of libxml2.
|
|
The module is split into several packages which are not described in this section;
|
|
unless stated otherwise, you only need to <literal>use XML::LibXML;</literal>
|
|
in your programs.</para>
|
|
|
|
<para>Check out <ulink url="http://grantm.github.io/perl-libxml-by-example/">XML::LibXML by Example</ulink>
|
|
for a tutorial.</para>
|
|
|
|
<para>For further information, please check the following documentation:</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Parser"/></term>
|
|
|
|
<listitem>
|
|
<para>Parsing XML files with XML::LibXML</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-DOM"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML Document Object Model (DOM) Implementation</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-SAX"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML direct SAX parser</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Reader"/></term>
|
|
|
|
<listitem>
|
|
<para>Reading XML with a pull-parser</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Dtd"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML frontend for DTD validation</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-RelaxNG"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML frontend for RelaxNG schema validation</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Schema"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML frontend for W3C Schema schema validation</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-XPathContext"/></term>
|
|
<listitem>
|
|
<para>API for evaluating XPath expressions with enhanced support
|
|
for the evaluation context</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-InputCallback"/></term>
|
|
|
|
<listitem>
|
|
<para>Implementing custom URI Resolver and input callbacks</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Common"/></term>
|
|
<listitem>
|
|
<para>Common functions for XML::LibXML related Classes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>The nodes in the Document Object Model (DOM) are represented by the following classes
|
|
(most of which "inherit" from <xref linkend="XML-LibXML-Node"/>):</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Document"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML class for DOM document nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Node"/></term>
|
|
|
|
<listitem>
|
|
<para>Abstract base class for XML::LibXML DOM nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Element"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML class for DOM element nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Text"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML class for DOM text nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Comment"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML class for comment DOM nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-CDATASection"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML class for DOM CDATA sections</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Attr"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML DOM attribute class</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-DocumentFragment"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML's DOM L2 Document Fragment implementation</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-Namespace"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML DOM namespace nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><xref linkend="XML-LibXML-PI"/></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML DOM processing instruction nodes</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Encodings support in XML::LibXML</title>
|
|
<para>Recall that since version 5.6.1, Perl distinguishes between
|
|
character strings (internally encoded in UTF-8) and so
|
|
called binary data and, accordingly, applies either
|
|
character or byte semantics to them. A scalar
|
|
representing a character string is distinguished from
|
|
a byte string by special flag (UTF8). Please refer to <emphasis>perlunicode</emphasis> for details.
|
|
</para>
|
|
<para>
|
|
XML::LibXML's API is designed to deal with many
|
|
encodings of XML documents completely transparently, so
|
|
that the application using XML::LibXML can be completely
|
|
ignorant about the encoding of the XML documents it works with.
|
|
On the other hand, functions like <function>XML::LibXML::Document->setEncoding</function>
|
|
give the user control over the document encoding.
|
|
</para>
|
|
<para>
|
|
To ensure the aforementioned transparency and
|
|
uniformity, most functions of XML::LibXML that work with
|
|
in-memory trees accept and return data as character
|
|
strings (i.e. UTF-8 encoded with the UTF8 flag on)
|
|
regardless of the original document encoding; however,
|
|
the functions related to I/O operations (i.e. parsing
|
|
and saving) operate with binary data (in the original
|
|
document encoding) obeying the encoding declaration of
|
|
the XML documents.</para>
|
|
<para>Below we summarize basic rules and principles
|
|
regarding encoding:
|
|
</para>
|
|
<orderedlist>
|
|
<listitem><para>Do NOT apply any encoding-related PerlIO layers
|
|
(<literal>:utf8</literal> or <literal>:encoding(...)</literal>)
|
|
to file handles that are an input for the parses
|
|
or an output for a serializer of (full) XML documents.
|
|
This is because the conversion of the data to/from the internal character representation
|
|
is provided by libxml2 itself which must be able to enforce the encoding
|
|
specified by the <literal><?xml version="1.0" encoding="..."?></literal>
|
|
declaration. Here is an example to follow:
|
|
<programlisting>use XML::LibXML;
|
|
# load
|
|
open my $fh, '<', 'file.xml';
|
|
binmode $fh; # drop all PerlIO layers possibly created by a <literal>use open</literal> pragma
|
|
$doc = XML::LibXML->load_xml(IO => $fh);
|
|
|
|
# save
|
|
open my $out, '>', 'out.xml';
|
|
binmode $out; # as above
|
|
$doc->toFH($out);
|
|
# or
|
|
print {$out} $doc->toString();</programlisting>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>All functions working with DOM accept and return
|
|
character strings (UTF-8 encoded with UTF8 flag on). E.g.
|
|
<programlisting><![CDATA[
|
|
my $doc = XML::LibXML::Document->new('1.0',$some_encoding);
|
|
my $element = $doc->createElement($name);
|
|
$element->appendText($text);
|
|
$xml_fragment = $element->toString(); # returns a character string
|
|
$xml_document = $doc->toString(); # returns a byte string
|
|
]]>
|
|
</programlisting>
|
|
where
|
|
<literal>$some_encoding</literal> is the document encoding
|
|
that will be used when saving the document,
|
|
and <literal>$name</literal> and <literal>$text</literal>
|
|
contain character strings (UTF-8 encoded with UTF8 flag on).
|
|
Note that the method <function>toString</function>
|
|
returns XML as a character string if applied to
|
|
other node than the Document node and
|
|
a byte string containing the appropriate
|
|
<programlisting><?xml version="1.0" encoding="..."?></programlisting>
|
|
declaration if applied to a <xref linkend="XML-LibXML-Document"/>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>DOM methods also accept binary strings in the original encoding of the
|
|
document to which the node belongs (UTF-8 is assumed if the node is not
|
|
attached to any document). Exploiting this feature is NOT RECOMMENDED
|
|
since it is considered bad practice.
|
|
</para>
|
|
<programlisting><![CDATA[
|
|
my $doc = XML::LibXML::Document->new('1.0','iso-8859-2');
|
|
my $text = $doc->createTextNode($some_latin2_encoded_byte_string);
|
|
# WORKS, BUT NOT RECOMMENDED!
|
|
]]>
|
|
</programlisting>
|
|
</listitem>
|
|
</orderedlist>
|
|
<para><emphasis>NOTE:</emphasis> libxml2 support for many
|
|
encodings is based on the iconv library. The actual list
|
|
of supported encodings may vary from platform to
|
|
platform. To test if your platform works correctly with
|
|
your language encoding, build a simple document in the
|
|
particular encoding and try to parse it with XML::LibXML
|
|
to see if the parser produces any errors. Occasional
|
|
crashes were reported on rare platforms that ship with a broken
|
|
version of iconv.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Thread Support</title>
|
|
<para>
|
|
XML::LibXML since 1.67 partially supports Perl threads
|
|
in Perl >= 5.8.8. XML::LibXML can be used with threads
|
|
in two ways:
|
|
</para>
|
|
<para>
|
|
By default, all
|
|
XML::LibXML classes use CLONE_SKIP class method
|
|
to prevent Perl from copying XML::LibXML::* objects
|
|
when a new thread is spawn.
|
|
In this mode, all XML::LibXML::* objects are thread specific.
|
|
This is the safest way
|
|
to work with XML::LibXML in threads.
|
|
</para>
|
|
<para>
|
|
Alternatively, one may use
|
|
</para>
|
|
<programlisting>use threads;
|
|
use XML::LibXML qw(:threads_shared);</programlisting>
|
|
<para>
|
|
to indicate, that
|
|
all XML::LibXML node and parser objects
|
|
should be shared between the main thread
|
|
and any thread spawn from there.
|
|
For example, in
|
|
</para>
|
|
<programlisting>my $doc = XML::LibXML->load_xml(location => $filename);
|
|
my $thr = threads->new(sub{
|
|
# code working with $doc
|
|
1;
|
|
});
|
|
$thr->join;
|
|
</programlisting>
|
|
<para>
|
|
the variable <literal>$doc</literal>
|
|
refers to the exact same XML::LibXML::Document
|
|
in the spawned thread as in the main thread.
|
|
</para>
|
|
<para>
|
|
Without using mutex locks,
|
|
parallel threads may read the same document
|
|
(i.e. any node that belongs to the document),
|
|
parse files, and modify different documents.
|
|
</para>
|
|
<para>
|
|
However, if there is a chance that
|
|
some of the threads will attempt to modify a document
|
|
(or even create
|
|
new nodes based on that document,
|
|
e.g. with <literal>$doc->createElement</literal>)
|
|
that other threads may be reading at the same time,
|
|
the user is responsible for creating a mutex lock
|
|
and using it in <emphasis>both</emphasis>
|
|
in the thread that modifies and
|
|
the thread that reads:
|
|
</para>
|
|
<programlisting>my $doc = XML::LibXML->load_xml(location => $filename);
|
|
my $mutex : shared;
|
|
my $thr = threads->new(sub{
|
|
lock $mutex;
|
|
my $el = $doc->createElement('foo');
|
|
# ...
|
|
1;
|
|
});
|
|
{
|
|
lock $mutex;
|
|
my $root = $doc->documentElement;
|
|
say $root->name;
|
|
}
|
|
$thr->join;
|
|
</programlisting>
|
|
<para>Note that libxml2 uses dictionaries to store short strings and
|
|
these dictionaries are kept on a document node. Without mutex locks, it
|
|
could happen in the previous example that the thread modifies the
|
|
dictionary while other threads attempt to read from it, which could
|
|
easily lead to a crash.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Version Information</title>
|
|
|
|
<para>Sometimes it is useful to figure out, for which
|
|
version XML::LibXML was compiled for. In most cases this
|
|
is for debugging or to check if a given installation meets
|
|
all functionality for the package. The functions
|
|
XML::LibXML::LIBXML_DOTTED_VERSION and
|
|
XML::LibXML::LIBXML_VERSION provide this version
|
|
information. Both functions simply pass through the values
|
|
of the similar named macros of libxml2.
|
|
Similarly, XML::LibXML::LIBXML_RUNTIME_VERSION returns
|
|
the version of the (usually dynamically) linked libxml2.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>XML::LibXML::LIBXML_DOTTED_VERSION</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$Version_String = XML::LibXML::LIBXML_DOTTED_VERSION;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the version string of the
|
|
libxml2 version XML::LibXML was compiled
|
|
for. This will be "2.6.2" for "libxml2
|
|
2.6.2".</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>XML::LibXML::LIBXML_VERSION</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$Version_ID = XML::LibXML::LIBXML_VERSION;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the version id of the libxml2
|
|
version XML::LibXML was compiled for. This
|
|
will be "20602" for "libxml2 2.6.2". Don't mix
|
|
this version id with
|
|
$XML::LibXML::VERSION. The latter contains the
|
|
version of XML::LibXML itself while the first
|
|
contains the version of libxml2 XML::LibXML
|
|
was compiled for.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>XML::LibXML::LIBXML_RUNTIME_VERSION</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$DLL_Version = XML::LibXML::LIBXML_RUNTIME_VERSION;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns a version string of the libxml2
|
|
which is (usually dynamically) linked by
|
|
XML::LibXML. This will be "20602" for libxml2
|
|
released as "2.6.2" and something like
|
|
"20602-CVS2032" for a CVS build of
|
|
libxml2.</para>
|
|
<para>XML::LibXML issues a warning if the version
|
|
of libxml2 dynamically linked to it is less than the version of libxml2
|
|
which it was compiled against.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>EXPORTS</title>
|
|
<para>
|
|
By default the module exports all constants and functions
|
|
listed in the :all tag, described below.
|
|
</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>EXPORT TAGS</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal>:all</literal></term>
|
|
<listitem>
|
|
<para>Includes the tags <literal>:libxml</literal>, <literal>:encoding</literal>, and
|
|
<literal>:ns</literal> described below.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>:libxml</literal></term>
|
|
<listitem>
|
|
<para>Exports integer constants for DOM node types.</para>
|
|
<programlisting>XML_ELEMENT_NODE => 1
|
|
XML_ATTRIBUTE_NODE => 2
|
|
XML_TEXT_NODE => 3
|
|
XML_CDATA_SECTION_NODE => 4
|
|
XML_ENTITY_REF_NODE => 5
|
|
XML_ENTITY_NODE => 6
|
|
XML_PI_NODE => 7
|
|
XML_COMMENT_NODE => 8
|
|
XML_DOCUMENT_NODE => 9
|
|
XML_DOCUMENT_TYPE_NODE => 10
|
|
XML_DOCUMENT_FRAG_NODE => 11
|
|
XML_NOTATION_NODE => 12
|
|
XML_HTML_DOCUMENT_NODE => 13
|
|
XML_DTD_NODE => 14
|
|
XML_ELEMENT_DECL => 15
|
|
XML_ATTRIBUTE_DECL => 16
|
|
XML_ENTITY_DECL => 17
|
|
XML_NAMESPACE_DECL => 18
|
|
XML_XINCLUDE_START => 19
|
|
XML_XINCLUDE_END => 20</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>:encoding</literal></term>
|
|
<listitem>
|
|
<para>Exports two encoding conversion functions from XML::LibXML::Common.</para>
|
|
<programlisting>
|
|
encodeToUTF8()
|
|
decodeFromUTF8()
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><literal>:ns</literal></term>
|
|
<listitem>
|
|
<para>Exports two convenience constants: the implicit namespace of the
|
|
reserved <literal>xml:</literal> prefix,
|
|
and the implicit namespace for the reserved <literal>xmlns:</literal> prefix.</para>
|
|
<programlisting>
|
|
XML_XML_NS => 'http://www.w3.org/XML/1998/namespace'
|
|
XML_XMLNS_NS => 'http://www.w3.org/2000/xmlns/'
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Related Modules</title>
|
|
|
|
<para>The modules described in this section are not part of the XML::LibXML package itself. As they support some additional features, they are
|
|
mentioned here.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><olink targetdoc="XML::LibXSLT">XML::LibXSLT</olink></term>
|
|
|
|
<listitem>
|
|
<para>XSLT 1.0 Processor using libxslt and XML::LibXML</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><olink targetdoc="XML::LibXML::Iterator">XML::LibXML::Iterator</olink></term>
|
|
|
|
<listitem>
|
|
<para>XML::LibXML Implementation of the DOM Traversal Specification</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><olink targetdoc="XML::CompactTree::XS">XML::CompactTree::XS</olink></term>
|
|
|
|
<listitem>
|
|
<para>Uses XML::LibXML::Reader to very efficiently to parse XML document
|
|
or element into native Perl data structures, which are less flexible but
|
|
significantly faster to process then DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>XML::LibXML and XML::GDOME</title>
|
|
|
|
<para>Note: <emphasis>THE FUNCTIONS DESCRIBED HERE ARE STILL EXPERIMENTAL</emphasis></para>
|
|
|
|
<para>Although both modules make use of libxml2's XML capabilities, the DOM implementation of both modules are not compatible. But still it is
|
|
possible to exchange nodes from one DOM to the other. The concept of this exchange is pretty similar to the function cloneNode(): The particular
|
|
node is copied on the low-level to the opposite DOM implementation.</para>
|
|
|
|
<para>Since the DOM implementations cannot coexist within one document, one is forced to copy each node that should be used. Because you are always
|
|
keeping two nodes this may cause quite an impact on a machines memory usage.</para>
|
|
|
|
<para>XML::LibXML provides two functions to export or import GDOME nodes: import_GDOME() and export_GDOME(). Both function have two parameters: the
|
|
node and a flag for recursive import. The flag works as in cloneNode().</para>
|
|
|
|
<para>The two functions allow one to export and import XML::GDOME nodes explicitly, however, XML::LibXML also allows the transparent import of
|
|
XML::GDOME nodes in functions such as appendChild(), insertAfter() and so on. While native nodes are automatically adopted in most functions
|
|
XML::GDOME nodes are always cloned in advance. Thus if the original node is modified after the operation, the node in the XML::LibXML document will
|
|
not have this information.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>import_GDOME</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$libxmlnode = XML::LibXML->import_GDOME( $node, $deep );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This clones an XML::GDOME node to an XML::LibXML node explicitly.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>export_GDOME</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$gdomenode = XML::LibXML->export_GDOME( $node, $deep );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Allows one to clone an XML::LibXML node into an
|
|
XML::GDOME node.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>CONTACTS</title>
|
|
|
|
<para>For bug reports, please use the CPAN request tracker on http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-LibXML</para>
|
|
<para>For suggestions etc., and other issues
|
|
related to XML::LibXML you may use the perl XML mailing list
|
|
(<email>perl-xml@listserv.ActiveState.com</email>),
|
|
where most XML-related Perl modules are discussed.
|
|
In case of problems you should check the archives of that
|
|
list first. Many problems are already discussed there. You
|
|
can find the list's archives and subscription options at
|
|
<ulink url="http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml">http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml</ulink>.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Parser">
|
|
<title>Parsing XML Data with XML::LibXML</title>
|
|
|
|
<titleabbrev>XML::LibXML::Parser</titleabbrev>
|
|
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
|
|
<programlisting>use XML::LibXML '1.70';
|
|
<!--
|
|
my $dom = XML::LibXML->load_xml(
|
|
location => $file_or_url,
|
|
# or string => $xml_string,
|
|
# or IO => $perl_file_handle,
|
|
# ...parser options...
|
|
);
|
|
|
|
my $html_dom = XML::LibXML->load_html(
|
|
location => $file_or_url,
|
|
# or string => $html_string,
|
|
# or IO => $perl_file_handle,
|
|
# ...parser options...
|
|
);
|
|
|
|
my $parser = XML::LibXML->new(
|
|
# ... parser options ...
|
|
);
|
|
|
|
my $doc = $parser->parse_string(<<'EOT');
|
|
<some-xml/>
|
|
EOT
|
|
my $fdoc = $parser->parse_file( $xmlfile );
|
|
|
|
my $fhdoc = $parser->parse_fh( $xmlstream );
|
|
|
|
my $fragment = $parser->parse_xml_chunk( $xml_wb_chunk );
|
|
--></programlisting>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Parsing</title>
|
|
|
|
<para>An XML document is read into a data structure such as a DOM tree by a piece of software, called a parser. XML::LibXML currently provides four
|
|
different parser interfaces:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>A DOM Pull-Parser</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>A DOM Push-Parser</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>A SAX Parser</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>A DOM based SAX Parser.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<sect2>
|
|
<title>Creating a Parser Instance</title>
|
|
|
|
<para>XML::LibXML provides an OO interface to the libxml2 parser functions. Thus you have to create a parser instance before you can parse any
|
|
XML data.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo># Parser constructor</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>
|
|
$parser = XML::LibXML->new();
|
|
$parser = XML::LibXML->new(option=>value, ...);
|
|
$parser = XML::LibXML->new({option=>value, ...});</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Create a new XML and HTML parser instance.
|
|
Each parser instance holds default
|
|
values for various parser options.
|
|
Optionally,
|
|
one can pass a hash reference or
|
|
a list of option => value pairs to
|
|
set a different default set of options.
|
|
Unless specified otherwise, the options
|
|
<literal>load_ext_dtd</literal>, and
|
|
<literal>expand_entities</literal> are set to 1.
|
|
See <xref linkend="parser-options"/> for a list of libxml2 parser's options.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>DOM Parser</title>
|
|
|
|
<para>One of the common parser interfaces of XML::LibXML is the DOM parser. This parser reads XML data into a DOM like data structure, so each
|
|
tag can get accessed and transformed.</para>
|
|
|
|
<para>XML::LibXML's DOM parser is not only capable to parse XML data, but also (strict) HTML files. There are three ways to parse
|
|
documents - as a string, as a Perl filehandle, or as a filename/URL. The return value from each is a <xref linkend="XML-LibXML-Document"/> object, which is a DOM
|
|
object.</para>
|
|
|
|
<para>All of the functions listed below will throw an exception if the document is invalid. To prevent this causing your program exiting, wrap
|
|
the call in an eval{} block</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>load_xml</term>
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo>
|
|
# Parsing XML</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>
|
|
$dom = XML::LibXML->load_xml(
|
|
location => $file_or_url
|
|
# parser options ...
|
|
);
|
|
$dom = XML::LibXML->load_xml(
|
|
string => $xml_string
|
|
# parser options ...
|
|
);
|
|
$dom = XML::LibXML->load_xml(
|
|
string => (\$xml_string)
|
|
# parser options ...
|
|
);
|
|
$dom = XML::LibXML->load_xml({
|
|
IO => $perl_file_handle
|
|
# parser options ...
|
|
);
|
|
$dom = $parser->load_xml(...);
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This function is available since XML::LibXML 1.70. It provides easy to use interface to the XML parser that parses
|
|
given file (or non-HTTPS URL), string, or input stream
|
|
to a DOM tree. The arguments
|
|
can be passed in a HASH reference
|
|
or as name => value pairs.
|
|
The function can be called
|
|
as a class method or an object method.
|
|
In both cases it internally creates a new
|
|
parser instance passing
|
|
the specified parser options;
|
|
if called as an object method,
|
|
it clones the original parser (preserving
|
|
its settings) and additionally applies
|
|
the specified options to the new parser.
|
|
See the constructor <function>new</function>
|
|
and <xref linkend="parser-options"/>
|
|
for more information.
|
|
</para>
|
|
<para>Note that, due to a limitation in the underlying libxml2
|
|
library, this call does not recognize HTTPS-based URLs. (It
|
|
will treat an HTTPS URL as a filename, likely throwing a "No such
|
|
file or directory" exception.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>load_html</term>
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo># Parsing HTML</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>
|
|
$dom = XML::LibXML->load_html(...);
|
|
$dom = $parser->load_html(...);
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This function is available since XML::LibXML 1.70. It has the same usage as <function>load_xml</function>,
|
|
providing interface to the HTML parser.
|
|
See <function>load_xml</function> for more information.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>Parsing HTML may cause problems, especially if
|
|
the ampersand ('&') is used. This is a common
|
|
problem if HTML code is parsed that contains links to
|
|
CGI-scripts. Such links cause the parser to throw
|
|
errors. In such cases libxml2 still parses the entire
|
|
document as there was no error, but the error causes
|
|
XML::LibXML to stop the parsing process. However, the
|
|
document is not lost. Such HTML documents should be
|
|
parsed using the <emphasis>recover</emphasis> flag. By
|
|
default recovering is deactivated.</para>
|
|
|
|
<para>The functions described above are implemented to
|
|
parse well formed documents. In some cases a program
|
|
gets well balanced XML instead of well formed
|
|
documents (e.g. an XML fragment from a database). With
|
|
XML::LibXML it is not required to wrap such fragments
|
|
in the code, because XML::LibXML is capable even to
|
|
parse well balanced XML fragments.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>parse_balanced_chunk</term>
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo># Parsing well-balanced XML chunks
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$fragment = $parser->parse_balanced_chunk( $wbxmlstring, $encoding );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function parses a well balanced XML string into a <xref linkend="XML-LibXML-DocumentFragment"/>. The first arguments contains the input string, the optional second argument can be used to specify character encoding of the input (UTF-8 is assumed by default).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_xml_chunk</term>
|
|
|
|
<listitem>
|
|
<para>This is the old name of parse_balanced_chunk(). Because it may causes confusion with the push parser interface, this function
|
|
should not be used anymore.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>By default XML::LibXML does not process XInclude tags
|
|
within an XML Document (see options section below).
|
|
XML::LibXML allows one to post-process a document to expand
|
|
XInclude tags.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>process_xincludes</term>
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo>
|
|
# Processing XInclude
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->process_xincludes( $doc );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>After a document is parsed into a DOM structure, you may want to expand the documents XInclude tags. This function processes
|
|
the given document structure and expands all XInclude tags (or throws an error) by using the flags and callbacks of the given parser
|
|
instance.</para>
|
|
|
|
<para>Note that the resulting Tree contains some extra nodes (of type XML_XINCLUDE_START and XML_XINCLUDE_END) after successfully
|
|
processing the document. These nodes indicate where data was included into the original tree. if the document is serialized, these
|
|
extra nodes will not show up.</para>
|
|
|
|
<para>Remember: A Document with processed XIncludes differs from the original document after serialization, because the original
|
|
XInclude tags will not get restored!</para>
|
|
|
|
<para>If the parser flag "expand_xincludes" is set to 1, you need not to post process the parsed document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>processXIncludes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->processXIncludes( $doc );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This is an alias to process_xincludes, but through a JAVA like function name.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_file</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo>
|
|
# Old-style parser interfaces
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->parse_file( $xmlfilename );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function parses an XML document from a file or network;
|
|
$xmlfilename can be either a filename or a (non-HTTPS) URL.
|
|
Note that for parsing files, this function is the fastest choice,
|
|
about 6-8 times faster then parse_fh().
|
|
</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_fh</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->parse_fh( $io_fh );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>parse_fh() parses a IOREF or a subclass of IO::Handle.</para>
|
|
|
|
<para>Because the data comes from an open handle, libxml2's parser does not know about the base URI of the document. To set the
|
|
base URI one should use parse_fh() as follows:</para>
|
|
|
|
<programlisting>my $doc = $parser->parse_fh( $io_fh, $baseuri );</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_string</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->parse_string( $xmlstring);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is similar to parse_fh(), but it parses an XML document that is available as a single string in memory, or alternatively as a reference to a scalar containing a string. Again,
|
|
you can pass an optional base URI to the function.</para>
|
|
|
|
<programlisting>my $doc = $parser->parse_string( $xmlstring, $baseuri );
|
|
my $doc = $parser->parse_string(\$xmlstring, $baseuri);
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_html_file</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->parse_html_file( $htmlfile, \%opts );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Similar to parse_file() but parses HTML (strict) documents;
|
|
$htmlfile can be filename or (non-HTTPS) URL.
|
|
</para>
|
|
<para>An optional second argument can be
|
|
used to pass some options to the HTML
|
|
parser as a HASH reference.
|
|
See options labeled with HTML in <xref linkend="parser-options"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_html_fh</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->parse_html_fh( $io_fh, \%opts );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Similar to parse_fh() but parses HTML (strict) streams.</para>
|
|
<para>
|
|
An optional second argument can be used
|
|
to pass some options to the HTML parser
|
|
as a HASH reference.
|
|
See options labeled with HTML in <xref linkend="parser-options"/>.
|
|
</para>
|
|
<para>
|
|
Note: encoding option may
|
|
not work correctly with this function
|
|
in libxml2 < 2.6.27 if the HTML file
|
|
declares charset using a META tag.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_html_string</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->parse_html_string( $htmlstring, \%opts );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Similar to parse_string() but parses HTML (strict) strings.</para>
|
|
<para>An optional second argument can be used to pass some options to the
|
|
HTML parser as a HASH reference.
|
|
See options labeled with HTML in <xref linkend="parser-options"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Push Parser</title>
|
|
|
|
<para>XML::LibXML provides a push parser interface. Rather than pulling the data from a given source the push parser waits for the data to be
|
|
pushed into it.</para>
|
|
|
|
<para>This allows one to parse large documents without waiting for the parser to finish. The interface is especially useful if a program needs
|
|
to pre-process the incoming pieces of XML (e.g. to detect document boundaries).</para>
|
|
|
|
<para>While XML::LibXML parse_*() functions force the data to be a well-formed XML, the push parser will take any arbitrary string that contains
|
|
some XML data. The only requirement is that all the pushed strings are together a well formed document. With the push parser interface a
|
|
program can interrupt the parsing process as required, where the parse_*() functions give not enough flexibility.</para>
|
|
|
|
<para>Different to the pull parser implemented in parse_fh() or parse_file(), the push parser is not able to find out about the documents end
|
|
itself. Thus the calling program needs to indicate explicitly when the parsing is done.</para>
|
|
|
|
<para>In XML::LibXML this is done by a single function:</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>parse_chunk</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo>
|
|
# Push parser
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->parse_chunk($string, $terminate);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>parse_chunk() tries to parse a given chunk of data, which isn't necessarily well balanced data. The function takes two
|
|
parameters: The chunk of data as a string and optional a termination flag. If the termination flag is set to a true value (e.g. 1),
|
|
the parsing will be stopped and the resulting document will be returned as the following example describes:</para>
|
|
|
|
<programlisting>my $parser = XML::LibXML->new;
|
|
for my $string ( "<", "foo", ' bar="hello world"', "/>") {
|
|
$parser->parse_chunk( $string );
|
|
}
|
|
my $doc = $parser->parse_chunk("", 1); # terminate the parsing</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>Internally XML::LibXML provides three functions that control the push parser process:</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>init_push</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->init_push();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Initializes the push parser.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>push</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->push(@data);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function pushes the data stored inside the array to libxml2's parser. Each entry in @data must be a normal scalar! This method can be called repeatedly.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>finish_push</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc = $parser->finish_push( $recover );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function returns the result of the parsing process. If this function is called without a parameter it will complain about
|
|
non well-formed documents. If $restore is 1, the push parser can be used to restore broken or non well formed (XML) documents as the
|
|
following example shows:</para>
|
|
|
|
<programlisting>eval {
|
|
$parser->push( "<foo>", "bar" );
|
|
$doc = $parser->finish_push(); # will report broken XML
|
|
};
|
|
if ( $@ ) {
|
|
# ...
|
|
}</programlisting>
|
|
|
|
<para>This can be annoying if the closing tag is missed by accident. The following code will restore the document:</para>
|
|
|
|
<programlisting>eval {
|
|
$parser->push( "<foo>", "bar" );
|
|
$doc = $parser->finish_push(1); # will return the data parsed
|
|
# unless an error happened
|
|
};
|
|
|
|
print $doc->toString(); # returns "<foo>bar</foo>"</programlisting>
|
|
|
|
<para>Of course finish_push() will return nothing if there was no data pushed to the parser before.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Pull Parser (Reader)</title>
|
|
<para>XML::LibXML also provides a pull-parser interface similar to the XmlReader interface in .NET.
|
|
This interface is almost streaming, and is usually faster and simpler to use than SAX.
|
|
See <xref linkend="XML-LibXML-Reader"/>.</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Direct SAX Parser</title>
|
|
<para>XML::LibXML provides a direct SAX parser in the <xref linkend="XML-LibXML-SAX"/> module.</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>DOM based SAX Parser</title>
|
|
|
|
<para>XML::LibXML also provides a DOM based SAX parser. The SAX parser is defined in
|
|
the module XML::LibXML::SAX::Parser. As it is not a stream based parser, it
|
|
parses documents into a DOM and traverses the DOM tree instead.</para>
|
|
|
|
<para>The API of this parser is exactly the same as any other Perl SAX2 parser. See XML::SAX::Intro for details.</para>
|
|
|
|
<para>Aside from the regular parsing methods, you can access the DOM tree traverser directly, using the generate() method:</para>
|
|
|
|
<programlisting>my $doc = build_yourself_a_document();
|
|
my $saxparser = $XML::LibXML::SAX::Parser->new( ... );
|
|
$parser->generate( $doc );</programlisting>
|
|
|
|
<para>This is useful for serializing DOM trees, for example that you might have done prior processing on, or that you have as a result of XSLT
|
|
processing.</para>
|
|
|
|
<para><emphasis>WARNING</emphasis></para>
|
|
|
|
<para>This is NOT a streaming SAX parser. As I said above, this parser reads the entire document into a DOM and serialises it. Some people
|
|
couldn't read that in the paragraph above so I've added this warning. If you want a streaming SAX parser look at the <xref linkend="XML-LibXML-SAX"/> man page</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Serialization</title>
|
|
|
|
<para>XML::LibXML provides some functions to serialize nodes and documents. The serialization functions are described on the <xref linkend="XML-LibXML-Node"/>
|
|
manpage or the <xref linkend="XML-LibXML-Document"/> manpage. XML::LibXML checks three global flags that alter the serialization process:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>skipXMLDeclaration</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>skipDTD</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>setTagCompression</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>of that three functions only setTagCompression is available for all serialization functions.</para>
|
|
|
|
<para>Because XML::LibXML does these flags not itself, one has to define them locally as the following example shows:</para>
|
|
|
|
<programlisting>local $XML::LibXML::skipXMLDeclaration = 1;
|
|
local $XML::LibXML::skipDTD = 1;
|
|
local $XML::LibXML::setTagCompression = 1;</programlisting>
|
|
|
|
<para>If skipXMLDeclaration is defined and not '0', the XML declaration is omitted during serialization.</para>
|
|
|
|
<para>If skipDTD is defined and not '0', an existing DTD would not be serialized with the document.</para>
|
|
|
|
<para>If setTagCompression is defined and not '0' empty tags are displayed as open and closing tags rather than the shortcut. For example
|
|
the empty tag <emphasis>foo</emphasis> will be rendered as <emphasis><foo></foo></emphasis> rather than <emphasis><foo/></emphasis>.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="parser-options">
|
|
<title>Parser Options</title>
|
|
|
|
<para>Handling of libxml2 parser options has been unified and improved in XML::LibXML 1.70.
|
|
You can now set default options for a particular parser instance by
|
|
passing them to the constructor as <literal>XML::LibXML->new({name=>value, ...})</literal>
|
|
or <literal>XML::LibXML->new(name=>value,...)</literal>.
|
|
The options can be queried and changed using the following methods (pre-1.70 interfaces such as <function>$parser->load_ext_dtd(0)</function> also exist, see below):
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>option_exists</term>
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo>
|
|
# Set/query parser options
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->option_exists($name);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns 1 if the current XML::LibXML version supports
|
|
the option <literal>$name</literal>, otherwise returns 0 (note that this does not necessarily mean that the option is supported
|
|
by the underlying libxml2 library).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>get_option</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->get_option($name);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns the current value of the parser option <literal>$name</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>set_option</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->set_option($name,$value);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Sets option <literal>$name</literal> to value <literal>$value</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>set_options</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->set_options({$name=>$value,...});</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Sets multiple parsing options at once.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
IMPORTANT NOTE: This documentation reflects the parser flags available in libxml2 2.7.3.
|
|
Some options have no effect if an older version of libxml2 is used.
|
|
</para>
|
|
<para>Each of the flags listed below is labeled</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>/parser/</term>
|
|
<listitem>
|
|
<para>if it can be used with a <function>XML::LibXML</function>
|
|
parser object (i.e. passed to <function>XML::LibXML->new</function>, <function>XML::LibXML->set_option</function>, etc.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>/html/</term>
|
|
<listitem>
|
|
<para>if it can be used passed to the <function>parse_html_*</function> methods</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>/reader/</term>
|
|
<listitem>
|
|
<para>if it can be used with the <function>XML::LibXML::Reader</function>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
Unless specified otherwise, the default for boolean valued options is 0 (false).
|
|
</para>
|
|
<para>The available options are:</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>URI</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>In case of parsing strings or file handles, XML::LibXML doesn't know about the base uri of the document. To make relative
|
|
references such as XIncludes work, one has to set a base URI, that is then used for the parsed document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>line_numbers</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>If this option is activated, libxml2 will store the line number of each element node in the parsed document.
|
|
The line number can be obtained using the <function>line_number()</function> method
|
|
of the <function>XML::LibXML::Node</function> class (for non-element nodes
|
|
this may report the line number of the containing element).
|
|
The line numbers are also used for reporting positions of validation errors.
|
|
</para>
|
|
<para>IMPORTANT:
|
|
Due to limitations in the libxml2 library line numbers greater than
|
|
65535 will be returned as 65535. Unfortunately, this is a long and sad story, please see
|
|
<ulink url="http://bugzilla.gnome.org/show_bug.cgi?id=325533">http://bugzilla.gnome.org/show_bug.cgi?id=325533</ulink> for more details.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>encoding</term>
|
|
<listitem>
|
|
<para>/html/</para>
|
|
<para>character encoding of the input</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>recover</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>recover from errors; possible values are 0, 1, and 2</para>
|
|
<para>
|
|
A true value turns on recovery mode which
|
|
allows one to parse broken XML or HTML data.
|
|
The recovery mode allows the parser to return
|
|
the successfully parsed portion of the input document.
|
|
This is useful for almost well-formed documents, where for example
|
|
a closing tag is missing somewhere. Still,
|
|
XML::LibXML will only parse until the first fatal (non-recoverable) error occurs,
|
|
reporting recoverable parsing errors as warnings. To suppress
|
|
even these warnings, use recover=>2.</para>
|
|
<para>Note that validation is switched off automatically in recovery mode.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>expand_entities</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>substitute entities; possible values are 0 and 1; default is 1</para>
|
|
<para>Note that although this flag disables entity substitution, it
|
|
does not prevent the parser from loading external entities;
|
|
when substitution of an external entity is disabled, the
|
|
entity will be represented in the document tree by an XML_ENTITY_REF_NODE node
|
|
whose subtree will be the content obtained by parsing the external resource;
|
|
Although this nesting is visible from the DOM
|
|
it is transparent to XPath data model, so it is possible to
|
|
match nodes in an unexpanded entity by the same XPath expression
|
|
as if the entity were expanded. See also ext_ent_handler.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>ext_ent_handler</term>
|
|
<listitem>
|
|
<para>/parser/</para>
|
|
<para>Provide a custom external entity handler
|
|
to be used when expand_entities is set to 1.
|
|
Possible value is a subroutine reference.
|
|
</para>
|
|
<para>This feature does not work properly in libxml2 < 2.6.27!</para>
|
|
<para>The subroutine provided is called whenever
|
|
the parser needs to retrieve the content of an external entity.
|
|
It is called with two arguments: the system ID (URI) and the public ID.
|
|
The value returned by the subroutine is parsed as the content of the entity.
|
|
</para>
|
|
<para>This method can be used to completely disable entity loading,
|
|
e.g. to prevent exploits of the type described at
|
|
<ulink url="http://searchsecuritychannel.techtarget.com/generic/0,295582,sid97_gci1304703,00.html"/>,
|
|
where a service is tricked to expose its private data
|
|
by letting it parse a remote file (RSS feed) that contains an entity reference to a local
|
|
file (e.g. <literal>/etc/fstab</literal>).
|
|
</para>
|
|
<para>A more granular solution to this problem, however, is
|
|
provided by custom URL resolvers, as in
|
|
<programlisting>
|
|
my $c = XML::LibXML::InputCallback->new();
|
|
sub match { # accept file:/ URIs except for XML catalogs in /etc/xml/
|
|
my ($uri) = @_;
|
|
return ($uri=~m{^file:/}
|
|
and $uri !~ m{^file:///etc/xml/})
|
|
? 1 : 0;
|
|
}
|
|
$c->register_callbacks([ \&match, sub{}, sub{}, sub{} ]);
|
|
$parser->input_callbacks($c);
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>load_ext_dtd</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>load the external DTD subset while parsing; possible values are 0 and 1. Unless specified,
|
|
XML::LibXML sets this option to 1.</para>
|
|
<para>This flag is also required for DTD Validation, to provide complete attribute,
|
|
and to expand entities, regardless if the document has an internal subset.
|
|
Thus switching off external DTD loading, will disable entity expansion,
|
|
validation, and complete attributes on internal subsets as well.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>complete_attributes</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>create default DTD attributes; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>validation</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>validate with the DTD; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>suppress_errors</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>suppress error reports; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>suppress_warnings</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>suppress warning reports; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>pedantic_parser</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>pedantic error reporting; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>no_blanks</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>remove blank nodes; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>no_defdtd</term>
|
|
<listitem>
|
|
<para>/html/</para>
|
|
<para>do not add a default DOCTYPE; possible values are 0 and 1</para>
|
|
<para>the default is (0) to add a DTD when the input html lacks one</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>expand_xinclude or xinclude</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>Implement XInclude substitution; possible values are 0 and 1</para>
|
|
<para>Expands XInclude tags immediately while parsing the document.
|
|
Note that the parser will use the URI resolvers installed
|
|
via <function>XML::LibXML::InputCallback</function> to parse the included document (if any).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>no_xinclude_nodes</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>do not generate XINCLUDE START/END nodes; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>no_network</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>Forbid network access; possible values are 0 and 1</para>
|
|
<para>If set to true, all
|
|
attempts to fetch non-local resources (such as
|
|
DTD or external entities) will fail (unless
|
|
custom callbacks are defined).</para>
|
|
<para>It may be
|
|
necessary to use the flag <literal>recover</literal> for
|
|
processing documents requiring such resources
|
|
while networking is off.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>clean_namespaces</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>remove redundant namespaces declarations during parsing; possible values are 0 and 1.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>no_cdata</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>merge CDATA as text nodes; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>no_basefix</term>
|
|
<listitem>
|
|
<para>/parser, reader/</para>
|
|
<para>not fixup XINCLUDE xml#base URIS; possible values are 0 and 1</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>huge</term>
|
|
<listitem>
|
|
<para>/parser, html, reader/</para>
|
|
<para>relax any hardcoded limit from the parser; possible values are 0 and 1. Unless specified,
|
|
XML::LibXML sets this option to 0.</para>
|
|
<para>Note: the default value for this option was changed to protect against denial
|
|
of service through entity expansion attacks. Before enabling the option ensure
|
|
you have taken alternative measures to protect your application against this type
|
|
of attack.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>gdome</term>
|
|
<listitem>
|
|
<para>/parser/</para>
|
|
<para>THIS OPTION IS EXPERIMENTAL!</para>
|
|
<para>Although quite powerful, XML::LibXML's DOM implementation is incomplete with respect to
|
|
the DOM level 2 or level 3 specifications.
|
|
XML::GDOME is based on libxml2 as well, and provides a rather complete DOM implementation by wrapping libgdome.
|
|
This flag allows you to make
|
|
use of XML::LibXML's full parser options and XML::GDOME's DOM implementation at the same time.</para>
|
|
<para>To make use of this function, one has to install libgdome and configure XML::LibXML to use this library.
|
|
For this you need to rebuild XML::LibXML!</para>
|
|
<para>Note: this feature was not seriously tested in recent XML::LibXML releases.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>For compatibility with XML::LibXML versions prior to 1.70,
|
|
the following methods are also supported for querying and setting the corresponding parser options
|
|
(if called without arguments, the methods return
|
|
the current value of the corresponding parser options; with an argument sets the option to a given value):
|
|
</para>
|
|
<programlisting>$parser->validation();
|
|
$parser->recover();
|
|
$parser->pedantic_parser();
|
|
$parser->line_numbers();
|
|
$parser->load_ext_dtd();
|
|
$parser->complete_attributes();
|
|
$parser->expand_xinclude();
|
|
$parser->gdome_dom();
|
|
$parser->clean_namespaces();
|
|
$parser->no_network();</programlisting>
|
|
<para>The following obsolete methods trigger parser options in some
|
|
special way:</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>recover_silently</term>
|
|
<listitem>
|
|
<programlisting>
|
|
$parser->recover_silently(1);
|
|
</programlisting>
|
|
<para>If called without an argument,
|
|
returns true if the current value of the <literal>recover</literal> parser
|
|
option is 2 and returns false otherwise.
|
|
With a true argument sets the <literal>recover</literal> parser option to 2;
|
|
with a false argument sets the <literal>recover</literal> parser option to 0.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>expand_entities</term>
|
|
<listitem>
|
|
<programlisting>
|
|
$parser->expand_entities(0);
|
|
</programlisting>
|
|
<para>Get/set the <literal>expand_entities</literal> option.
|
|
If called with a true argument, also turns
|
|
the <literal>load_ext_dtd</literal> option to 1.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>keep_blanks</term>
|
|
<listitem>
|
|
<programlisting>
|
|
$parser->keep_blanks(0);
|
|
</programlisting>
|
|
<para>This is actually the opposite of the <literal>no_blanks</literal> parser option.
|
|
If used without an argument retrieves negated value of <literal>no_blanks</literal>.
|
|
If used with an argument sets <literal>no_blanks</literal> to the opposite value.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>base_uri</term>
|
|
|
|
<listitem>
|
|
<programlisting>
|
|
$parser->base_uri( $your_base_uri );
|
|
</programlisting>
|
|
<para>Get/set the <literal>URI</literal> option.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>XML Catalogs</title>
|
|
<para><literal>libxml2</literal> supports XML catalogs.
|
|
Catalogs are used to
|
|
map remote resources to their local copies.
|
|
Using catalogs can speed up parsing processes if
|
|
many external resources from remote addresses
|
|
are loaded into the parsed documents (such as DTDs or XIncludes).
|
|
</para>
|
|
<para>
|
|
Note that libxml2 has a global pool of loaded catalogs,
|
|
so if you apply the method <literal>load_catalog</literal>
|
|
to one parser instance, all parser instances will start using the catalog
|
|
(in addition to other previously loaded catalogs).
|
|
</para>
|
|
<para>Note also that catalogs are not used
|
|
when a custom external entity handler is specified. At the
|
|
current state it is not possible to make use of both
|
|
types of resolving systems at the same time.</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>load_catalog</term>
|
|
<listitem>
|
|
<funcsynopsis role="synopsis">
|
|
<funcsynopsisinfo>
|
|
# XML catalogs
|
|
</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parser->load_catalog( $catalog_file );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Loads the XML catalog file $catalog_file.</para>
|
|
<programlisting>
|
|
# Global external entity loader (similar to ext_ent_handler option
|
|
# but this works really globally, also in XML::LibXSLT include etc..)
|
|
|
|
XML::LibXML::externalEntityLoader(\&my_loader);
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Error Reporting</title>
|
|
|
|
<para>XML::LibXML throws exceptions during parsing, validation or XPath processing (and some other occasions). These errors can be caught by using
|
|
<emphasis>eval</emphasis> blocks. The error is stored in <emphasis>$@</emphasis>.
|
|
There are two implementations: the old one throws $@ which is just a message string,
|
|
in the new one $@ is an object from the class XML::LibXML::Error;
|
|
this class overrides the operator "" so that when printed,
|
|
the object flattens to the usual error message.
|
|
</para>
|
|
|
|
<para>XML::LibXML throws errors as they occur. This is a very common misunderstanding in the use of XML::LibXML. If the eval is omitted, XML::LibXML will always halt your script by
|
|
"croaking" (see Carp man page for details).</para>
|
|
|
|
<para>Also note that an increasing number of functions throw errors if bad data is passed as arguments. If you cannot assure valid data passed to XML::LibXML you should eval
|
|
these functions.</para>
|
|
|
|
<para>Note: since version 1.59, get_last_error() is no longer available in XML::LibXML for thread-safety reasons.</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-SAX">
|
|
<title>XML::LibXML direct SAX parser</title>
|
|
|
|
<titleabbrev>XML::LibXML::SAX</titleabbrev>
|
|
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>XML::LibXML provides an interface to libxml2 direct SAX interface. Through this interface it is possible to generate SAX events directly while
|
|
parsing a document. While using the SAX parser XML::LibXML will not create a DOM Document tree.</para>
|
|
|
|
<para>Such an interface is useful if very large XML documents have to be processed and no DOM functions are required. By using this interface it is
|
|
possible to read data stored within an XML document directly into the application data structures without loading the document into memory.</para>
|
|
|
|
<para>The SAX interface of XML::LibXML is based on the famous XML::SAX interface. It uses the generic interface as provided by XML::SAX::Base.</para>
|
|
|
|
<para>Additionally to the generic functions, which are only able to process entire documents, XML::LibXML::SAX provides <emphasis>parse_chunk()</emphasis>.
|
|
This method generates SAX events from well balanced data such as is often provided by databases.</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Features</title>
|
|
|
|
<para><emphasis>NOTE:</emphasis> This feature is experimental. </para>
|
|
|
|
<para>You can enable character data joining which may yield a
|
|
significant speed boost in your XML processing in lower markup
|
|
ratio situations by enabling the
|
|
http://xmlns.perl.org/sax/join-character-data feature of this
|
|
parser. This is done via the set_feature method like
|
|
this:
|
|
</para>
|
|
|
|
<programlisting>$p->set_feature('http://xmlns.perl.org/sax/join-character-data', 1);
|
|
</programlisting>
|
|
|
|
<para>
|
|
You can also specify a 0 to disable. The default is to have
|
|
this feature disabled.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-SAX-Builder">
|
|
<title>Building DOM trees from SAX events.</title>
|
|
|
|
<titleabbrev>XML::LibXML::SAX::Builder</titleabbrev>
|
|
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
|
|
<programlisting>use XML::LibXML::SAX::Builder;
|
|
my $builder = XML::LibXML::SAX::Builder->new();
|
|
|
|
my $gen = XML::Generator::DBI->new(Handler => $builder, dbh => $dbh);
|
|
$gen->execute("SELECT * FROM Users");
|
|
|
|
my $doc = $builder->result();</programlisting>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>This is a SAX handler that generates a DOM tree from SAX events. Usage is as above. Input is accepted from any SAX1 or SAX2 event generator.</para>
|
|
|
|
<para>Building DOM trees from SAX events is quite easy with XML::LibXML::SAX::Builder. The class is designed as a SAX2 final handler not as a
|
|
filter!</para>
|
|
|
|
<para>Since SAX is strictly stream oriented, you should not expect anything to return from a generator. Instead you have to ask the builder instance
|
|
directly to get the document built. XML::LibXML::SAX::Builder's result() function holds the document generated from the last SAX stream.</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-DOM">
|
|
<title>XML::LibXML DOM Implementation</title>
|
|
<titleabbrev>XML::LibXML::DOM</titleabbrev>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>XML::LibXML provides a lightweight interface to
|
|
<emphasis>modify</emphasis> a node of the document tree
|
|
generated by the XML::LibXML parser. This interface
|
|
follows as far as possible the DOM Level 3
|
|
specification. In addition to the specified functions,
|
|
XML::LibXML supports some functions that are more handy to
|
|
use in the perl environment.</para>
|
|
|
|
<para>One also has to remember, that XML::LibXML is an
|
|
interface to libxml2 nodes which actually reside on the
|
|
C-Level of XML::LibXML. This means each node is a
|
|
reference to a structure which is different from a perl hash or
|
|
array. The only way to access these structures' values is
|
|
through the DOM interface provided by XML::LibXML. This
|
|
also means, that one <emphasis>can't</emphasis> simply
|
|
inherit an XML::LibXML node and add new member variables as
|
|
if they were hash keys.</para>
|
|
|
|
<para>The DOM interface of XML::LibXML does not intend to
|
|
implement a full DOM interface as it is done by XML::GDOME
|
|
and used for full featured application. Moreover, it
|
|
offers an simple way to build or modify documents that are
|
|
created by XML::LibXML's parser.</para>
|
|
|
|
<para>Another target of the XML::LibXML interface is to
|
|
make the interfaces of libxml2 available to the perl
|
|
community. This includes also some workarounds to some
|
|
features where libxml2 assumes more control over the
|
|
C-Level that most perl users don't have.</para>
|
|
|
|
<para>One of the most important parts of the XML::LibXML
|
|
DOM interface is that the interfaces try to follow the
|
|
<ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink> rather strictly. This means the
|
|
interface functions are named as the DOM specification
|
|
says and not what widespread Java interfaces claim to be
|
|
the standard. Although there are several functions that have
|
|
only a singular interface that conforms to the DOM spec
|
|
XML::LibXML provides an additional Java style alias
|
|
interface.</para>
|
|
|
|
<para>Moreover, there are some function interfaces left over
|
|
from early stages of XML::LibXML for compatibility
|
|
reasons. These interfaces are for compatibility reasons
|
|
<emphasis>only</emphasis>. They might disappear in one of
|
|
the future versions of XML::LibXML, so a user is requested
|
|
to switch over to the official functions.</para>
|
|
<sect2>
|
|
<title>Encodings and XML::LibXML's DOM implementation</title>
|
|
<para>See the section on Encodings in the <emphasis>XML::LibXML</emphasis> manual page.</para>
|
|
</sect2>
|
|
<sect2>
|
|
<title>Namespaces and XML::LibXML's DOM implementation</title>
|
|
|
|
<para>XML::LibXML's DOM implementation is
|
|
limited by the DOM implementation of libxml2
|
|
which treats namespaces slightly differently than
|
|
required by the DOM Level 2 specification.
|
|
</para>
|
|
<para>According to the DOM Level 2 specification,
|
|
namespaces of elements and attributes should be
|
|
persistent, and nodes should be permanently bound to
|
|
namespace URIs as they get created; it should be
|
|
possible to manipulate the special attributes used for
|
|
declaring XML namespaces just as other attributes
|
|
without affecting the namespaces of other nodes.
|
|
In DOM Level 2, the application is responsible
|
|
for creating the special attributes consistently and/or for correct
|
|
serialization of the document.
|
|
</para>
|
|
<para>
|
|
This is both inconvenient, causes problems in serialization
|
|
of DOM to XML, and most importantly, seems almost impossible
|
|
to implement over libxml2.
|
|
</para>
|
|
<para>
|
|
In libxml2, namespace URI and prefix of a node is
|
|
provided by a pointer to a namespace declaration
|
|
(appearing as a special xmlns attribute in the XML
|
|
document). If the prefix or namespace URI of the
|
|
declaration changes, the prefix and namespace URI of all
|
|
nodes that point to it changes as well. Moreover, in
|
|
contrast to DOM, a node (element or attribute) can only
|
|
be bound to a namespace URI if there is some namespace
|
|
declaration in the document to point to.
|
|
</para>
|
|
<para>
|
|
Therefore current DOM implementation in XML::LibXML tries
|
|
to treat namespace declarations in a compromise between
|
|
reason, common sense, limitations of libxml2, and the DOM
|
|
Level 2 specification.
|
|
</para>
|
|
<para>In XML::LibXML, special attributes declaring XML namespaces
|
|
are often created automatically, usually when
|
|
a namespaced node is attached to a document
|
|
and no existing declaration of the namespace and prefix is in the
|
|
scope to be reused.
|
|
In this respect,
|
|
XML::LibXML DOM implementation differs from the DOM
|
|
Level 2 specification according to which special
|
|
attributes for declaring the appropriate XML namespaces
|
|
should not be added when a node with a namespace prefix
|
|
and namespace URI is created.
|
|
</para>
|
|
<para>
|
|
Namespace declarations are also created when
|
|
<xref linkend="XML-LibXML-Document"/>'s
|
|
createElementNS() or createAttributeNS() function are used. If the
|
|
a namespace is not declared on the documentElement, the
|
|
namespace will be locally declared for the newly created
|
|
node. In case of Attributes this may look a bit confusing,
|
|
since these nodes cannot have namespace declarations
|
|
itself. In this case the namespace is internally applied
|
|
to the attribute and later declared on the node the
|
|
attribute is appended to (if required).</para>
|
|
<para>The following example may explain this a bit:</para>
|
|
<programlisting>my $doc = XML::LibXML->createDocument;
|
|
my $root = $doc->createElementNS( "", "foo" );
|
|
$doc->setDocumentElement( $root );
|
|
|
|
my $attr = $doc->createAttributeNS( "bar", "bar:foo", "test" );
|
|
$root->setAttributeNodeNS( $attr );</programlisting>
|
|
|
|
<para>This piece of code will result in the following document:</para>
|
|
|
|
<programlisting><?xml version="1.0"?>
|
|
<foo xmlns:bar="bar" bar:foo="test"/></programlisting>
|
|
|
|
<para>The namespace is declared on the document element
|
|
during the setAttributeNodeNS() call.
|
|
</para>
|
|
<para>Namespaces can be also declared explicitly by the use of XML::LibXML::Element's setNamespace() function.
|
|
Since 1.61, they can also be manipulated with functions
|
|
setNamespaceDeclPrefix() and setNamespaceDeclURI() (not available in DOM).
|
|
Changing an URI or prefix of an existing namespace declaration
|
|
affects the namespace URI and prefix of all nodes which point to it
|
|
(that is the nodes in its scope).
|
|
</para>
|
|
<para>It is also important to repeat the specification:
|
|
While working with namespaces you should use the namespace
|
|
aware functions instead of the simplified versions. For
|
|
example you should <emphasis>never</emphasis> use
|
|
setAttribute() but setAttributeNS().</para>
|
|
</sect2>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Document">
|
|
<title>XML::LibXML DOM Document Class</title>
|
|
|
|
<titleabbrev>XML::LibXML::Document</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Document nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>The Document Class is in most cases the result of a parsing process. But sometimes it is necessary to create a Document from scratch. The DOM
|
|
Document Class provides functions that conform to the DOM Core naming style.</para>
|
|
|
|
<para>It inherits all functions from <xref linkend="XML-LibXML-Node"/> as specified in the DOM specification. This enables access to the nodes
|
|
besides the root element on document level - a <function>DTD</function> for example. The support for these nodes is limited at the moment.</para>
|
|
|
|
<para>While generally nodes are bound to a document in the DOM concept it is suggested that one should always create a node not bound to any document.
|
|
There is no need of really including the node to the document, but once the node is bound to a document, it is quite safe that all strings have the
|
|
correct encoding. If an unbound text node with an ISO encoded string is created (e.g. with $CLASS->new()), the <function>toString</function> function
|
|
may not return the expected result.</para>
|
|
|
|
<para>To prevent such problems, it is recommended to pass all data to XML::LibXML methods
|
|
as character strings (i.e. UTF-8 encoded, with the UTF8 flag on).</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dom = XML::LibXML::Document->new( $version, $encoding );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>alias for createDocument()</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createDocument</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dom = XML::LibXML::Document->createDocument( $version, $encoding );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The constructor for the document class. As Parameter it takes the version string and (optionally) the encoding string. Simply calling
|
|
<emphasis>createDocument</emphasis>() will create the document:</para>
|
|
|
|
<programlisting><?xml version="your version" encoding="your encoding"?></programlisting>
|
|
|
|
<para>Both parameter are optional. The default value for <emphasis>$version</emphasis> is <function>1.0</function>, of course. If the
|
|
<emphasis>$encoding</emphasis> parameter is not set, the encoding will be left unset, which means UTF-8 is implied.</para>
|
|
|
|
<para>The call of <emphasis>createDocument</emphasis>() without any parameter will result the following code:</para>
|
|
|
|
<programlisting><?xml version="1.0"?> </programlisting>
|
|
|
|
<para>Alternatively one can call this constructor directly from the XML::LibXML class level, to avoid some typing. This will not have any
|
|
effect on the class instance, which is always XML::LibXML::Document.</para>
|
|
|
|
<programlisting> my $document = XML::LibXML->createDocument( "1.0", "UTF-8" );</programlisting>
|
|
|
|
<para>is therefore a shortcut for</para>
|
|
|
|
<programlisting>my $document = XML::LibXML::Document->createDocument( "1.0", "UTF-8" );</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>URI</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$strURI = $doc->URI();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the URI (or filename) of the original document.
|
|
For documents obtained by parsing a string of a FH
|
|
without using the URI parsing argument of the corresponding <function>parse_*</function> function,
|
|
the result is a generated string unknown-XYZ where XYZ is some number;
|
|
for documents created with the constructor <function>new</function>,
|
|
the URI is undefined.
|
|
</para>
|
|
<para>The value can be modified by calling <function>setURI</function>
|
|
method on the document node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setURI</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->setURI($strURI);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Sets the URI of the document reported by the method URI
|
|
(see also the URI argument to the various <function>parse_*</function> functions).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>encoding</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$strEncoding = $doc->encoding();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns the encoding string of the document.</para>
|
|
|
|
<programlisting>my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
|
|
print $doc->encoding; # prints ISO-8859-15</programlisting>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>actualEncoding</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$strEncoding = $doc->actualEncoding();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns the encoding in which the XML will be returned by $doc->toString().
|
|
This is usually the original encoding of the document as declared
|
|
in the XML declaration and returned by $doc->encoding.
|
|
If the original encoding is not known (e.g. if created in memory or parsed from a
|
|
XML without a declared encoding), 'UTF-8' is returned.
|
|
</para>
|
|
|
|
<programlisting>my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
|
|
print $doc->encoding; # prints ISO-8859-15</programlisting>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setEncoding</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->setEncoding($new_encoding);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method allows one to change the declaration of
|
|
encoding in the XML declaration of the document.
|
|
The value also affects the encoding in which the
|
|
document is serialized to XML by $doc->toString().
|
|
Use setEncoding() to remove the encoding declaration.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>version</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$strVersion = $doc->version();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns the version string of the document</para>
|
|
|
|
<para><emphasis>getVersion()</emphasis> is an alternative form of this function.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>standalone</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->standalone</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function returns the Numerical value of a documents XML declarations standalone attribute. It returns <emphasis>1</emphasis> if
|
|
standalone="yes" was found, <emphasis>0</emphasis> if standalone="no" was found and <emphasis>-1</emphasis> if standalone
|
|
was not specified (default on creation).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setStandalone</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->setStandalone($numvalue);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Through this method it is possible to alter the value of a documents standalone attribute. Set it to <emphasis>1</emphasis> to set
|
|
standalone="yes", to <emphasis>0</emphasis> to set standalone="no" or set it to <emphasis>-1</emphasis> to remove the
|
|
standalone attribute from the XML declaration.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>compression</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $compression = $doc->compression;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>libxml2 allows reading of documents directly from gzipped files. In this case the compression variable is set to the compression level
|
|
of that file (0-8). If XML::LibXML parsed a different source or the file wasn't compressed, the returned value will be
|
|
<emphasis>-1</emphasis>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setCompression</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->setCompression($ziplevel);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If one intends to write the document directly to a file, it is possible to set the compression level for a given document. This level
|
|
can be in the range from 0 to 8. If XML::LibXML should not try to compress use <emphasis>-1</emphasis> (default).</para>
|
|
|
|
<para>Note that this feature will <emphasis>only</emphasis> work if libxml2 is compiled with zlib support and toFile() is used for output.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toString</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$docstring = $dom->toString($format);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>toString</emphasis> is a DOM serializing function,
|
|
so the DOM Tree is serialized into an XML string, ready for output.</para>
|
|
<para>IMPORTANT: unlike toString for other nodes, on document nodes
|
|
this function returns the XML as a byte string in the original encoding of the
|
|
document (see the actualEncoding() method)! This means you
|
|
can simply do:
|
|
</para>
|
|
<programlisting>open my $out_fh, '>', $file;
|
|
print {$out_fh} $doc->toString;</programlisting>
|
|
<para>regardless of the actual encoding of the document.
|
|
See the section on encodings in <xref linkend="XML-LibXML"/> for more details.</para>
|
|
<para>The optional <emphasis>$format</emphasis> parameter sets the indenting of the output. This parameter is expected to be an
|
|
<function>integer</function> value, that specifies that indentation should be used. The format parameter can have three different values if
|
|
it is used:</para>
|
|
|
|
<para>If $format is 0, than the document is dumped as it was originally parsed</para>
|
|
|
|
<para>If $format is 1, libxml2 will add ignorable white spaces, so the nodes content is easier to read. Existing text nodes will not be
|
|
altered</para>
|
|
|
|
<para>If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a leading and a trailing line break to each text node.</para>
|
|
|
|
<para>libxml2 uses a hard-coded indentation of 2 space characters per indentation level. This value can not be altered on run-time.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toStringC14N</term>
|
|
|
|
<listitem>
|
|
<para><funcsynopsis><funcsynopsisinfo>$c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]); </funcsynopsisinfo></funcsynopsis>
|
|
See the documentation in <xref linkend="XML-LibXML-Node"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>toStringEC14N</term>
|
|
|
|
<listitem>
|
|
<para><funcsynopsis><funcsynopsisinfo>$ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list); </funcsynopsisinfo></funcsynopsis>
|
|
See the documentation in <xref linkend="XML-LibXML-Node"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>serialize</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$str = $doc->serialize($format); </funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>An alias for toString(). This function was name added to be more consistent
|
|
with libxml2.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>serialize_c14n</term>
|
|
|
|
<listitem>
|
|
<para>An alias for toStringC14N().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>serialize_exc_c14n</term>
|
|
|
|
<listitem>
|
|
<para>An alias for toStringEC14N().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toFile</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$state = $doc->toFile($filename, $format);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is similar to toString(), but it writes the document directly into a filesystem. This function is very useful, if one
|
|
needs to store large documents.</para>
|
|
|
|
<para>The format parameter has the same behaviour as in toString().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toFH</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$state = $doc->toFH($fh, $format);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is similar to toString(), but it writes the document directly to a filehandle or a stream. A byte stream in the document encoding is passed to the file handle. Do NOT apply any <literal>:encoding(...)</literal> or <literal>:utf8</literal> PerlIO layer to
|
|
the filehandle! See the section on encodings in <xref linkend="XML-LibXML"/> for more details.</para>
|
|
|
|
<para>The format parameter has the same behaviour as in toString().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toStringHTML</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$str = $document->toStringHTML();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>toStringHTML</emphasis> serialize the tree to a byte string in the document encoding as HTML. With this method indenting is automatic and managed by
|
|
libxml2 internally.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>serialize_html</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$str = $document->serialize_html();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>An alias for toStringHTML().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>is_valid</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $dom->is_valid();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns either TRUE or FALSE depending on whether the DOM Tree is a valid Document or not.</para>
|
|
|
|
<para>You may also pass in a <xref linkend="XML-LibXML-Dtd"/> object, to validate against an external DTD:</para>
|
|
|
|
<programlisting> if (!$dom->is_valid($dtd)) {
|
|
warn("document is not valid!");
|
|
}</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>validate</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dom->validate();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This is an exception throwing equivalent of is_valid. If the document is not valid it will throw an exception containing the error.
|
|
This allows you much better error reporting than simply is_valid or not.</para>
|
|
|
|
<para>Again, you may pass in a DTD object</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>documentElement</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$root = $dom->documentElement();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the root element of the Document. A document can have just one root element to contain the documents data.</para>
|
|
|
|
<para>Optionally one can use <emphasis>getDocumentElement</emphasis>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setDocumentElement</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dom->setDocumentElement( $root );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function enables you to set the root element for a document. The function supports the import of a node from a different document
|
|
tree, but does not support a document fragment as $root.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createElement</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$element = $dom->createElement( $nodename );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function creates a new Element Node bound to the DOM with the name <function>$nodename</function>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createElementNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$element = $dom->createElementNS( $namespaceURI, $nodename );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function creates a new Element Node bound to the DOM with the name <function>$nodename</function> and placed in the given
|
|
namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createTextNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text = $dom->createTextNode( $content_text );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>As an equivalent of <emphasis>createElement</emphasis>, but it creates a <emphasis>Text Node</emphasis> bound to the DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createComment</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$comment = $dom->createComment( $comment_text );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>As an equivalent of <emphasis>createElement</emphasis>, but it creates a <emphasis>Comment Node</emphasis> bound to the DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createAttribute</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attrnode = $doc->createAttribute($name [,$value]);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Creates a new Attribute node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createAttributeNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Creates an Attribute bound to a namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createDocumentFragment</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$fragment = $doc->createDocumentFragment();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function creates a DocumentFragment.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createCDATASection</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$cdata = $dom->createCDATASection( $cdata_content );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Similar to createTextNode and createComment, this function creates a CDataSection bound to the current DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createProcessingInstruction</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $pi = $doc->createProcessingInstruction( $target, $data );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>create a processing instruction node.</para>
|
|
|
|
<para>Since this method is quite long one may use its short form <emphasis>createPI()</emphasis>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createEntityReference</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $entref = $doc->createEntityReference($refname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a document has a DTD specified, one can create entity references by using this function. If one wants to add a entity reference to
|
|
the document, this reference has to be created by this function.</para>
|
|
|
|
<para>An entity reference is unique to a document and cannot be passed to other documents as other nodes can be passed.</para>
|
|
|
|
<para><emphasis>NOTE:</emphasis> A text content containing something that looks like an entity reference, will not be expanded to a real
|
|
entity reference unless it is a predefined entity</para>
|
|
|
|
<programlisting> my $string = "&foo;";
|
|
$some_element->appendText( $string );
|
|
print $some_element->textContent; # prints "&amp;foo;"</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createInternalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dtd = $document->createInternalSubset( $rootnode, $public, $system);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function creates and adds an internal subset to the given document. Because the function automatically adds the DTD to the document
|
|
there is no need to add the created node explicitly to the document.</para>
|
|
|
|
<programlisting> my $document = XML::LibXML::Document->new();
|
|
my $dtd = $document->createInternalSubset( "foo", undef, "foo.dtd" );</programlisting>
|
|
|
|
<para>will result in the following XML document:</para>
|
|
|
|
<programlisting><?xml version="1.0"?>
|
|
<!DOCTYPE foo SYSTEM "foo.dtd"> </programlisting>
|
|
|
|
<para>By setting the public parameter it is possible to set PUBLIC DTDs to a given document. So</para>
|
|
|
|
<programlisting>my $document = XML::LibXML::Document->new();
|
|
my $dtd = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );
|
|
</programlisting>
|
|
|
|
<para>will cause the following declaration to be created on the document:</para>
|
|
|
|
<programlisting><?xml version="1.0"?>
|
|
<!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN"></programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>createExternalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is similar to <function>createInternalSubset()</function> but this DTD is considered to be external and is therefore not
|
|
added to the document itself. Nevertheless it can be used for validation purposes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>importNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$document->importNode( $node );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a node is not part of a document, it can be imported to another document. As specified in DOM Level 2 Specification the Node will
|
|
not be altered or removed from its original document (<function>$node->cloneNode(1)</function> will get called implicitly).</para>
|
|
|
|
<para><emphasis>NOTE:</emphasis> Don't try to use importNode() to import sub-trees that contain an entity reference - even if the entity
|
|
reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of
|
|
XML::LibXML itself.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>adoptNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$document->adoptNode( $node );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a node is not part of a document, it can be imported to another document. As specified in DOM Level 3 Specification the Node will
|
|
not be altered but it will removed from its original document.</para>
|
|
|
|
<para>After a document adopted a node, the node, its attributes and all its descendants belong to the new document. Because the node does
|
|
not belong to the old document, it will be unlinked from its old location first.</para>
|
|
|
|
<para><emphasis>NOTE:</emphasis> Don't try to adoptNode() to import sub-trees that contain entity references - even if the entity
|
|
reference is the root node of the sub-tree. This will cause serious problems to your program. This is a limitation of libxml2 and not of
|
|
XML::LibXML itself.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>externalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $dtd = $doc->externalSubset;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a document has an external subset defined it will be returned by this function.</para>
|
|
|
|
<para><emphasis>NOTE</emphasis> Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In
|
|
particular one may not want use common node function on doctype declaration nodes!</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>internalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $dtd = $doc->internalSubset;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a document has an internal subset defined it will be returned by this function.</para>
|
|
|
|
<para><emphasis>NOTE</emphasis> Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in XML::LibXML is still limited. In
|
|
particular one may not want use common node function on doctype declaration nodes!</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setExternalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->setExternalSubset($dtd);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>EXPERIMENTAL!</emphasis></para>
|
|
|
|
<para>This method sets a DTD node as an external subset of the given document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setInternalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$doc->setInternalSubset($dtd);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>EXPERIMENTAL!</emphasis></para>
|
|
|
|
<para>This method sets a DTD node as an internal subset of the given document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>removeExternalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $dtd = $doc->removeExternalSubset();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>EXPERIMENTAL!</emphasis></para>
|
|
|
|
<para>If a document has an external subset defined it can be removed from the document by using this function. The removed dtd node will be
|
|
returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>removeInternalSubset</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $dtd = $doc->removeInternalSubset();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>EXPERIMENTAL!</emphasis></para>
|
|
|
|
<para>If a document has an internal subset defined it can be removed from the document by using this function. The removed dtd node will be
|
|
returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getElementsByTagName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my @nodelist = $doc->getElementsByTagName($tagname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Implements the DOM Level 2 function</para>
|
|
|
|
<para>In SCALAR context this function returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getElementsByTagNameNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Implements the DOM Level 2 function</para>
|
|
|
|
<para>In SCALAR context this function returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getElementsByLocalName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my @nodelist = $doc->getElementsByLocalName($localname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This allows the fetching of all nodes from a given document with the given Localname.</para>
|
|
|
|
<para>In SCALAR context this function returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getElementById</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $node = $doc->getElementById($id);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns the element that has an ID attribute
|
|
with the given value. If no such element exists,
|
|
this returns undef.</para>
|
|
<para>Note: the ID of an element
|
|
may change while manipulating the document.
|
|
For documents with a DTD, the information about ID attributes
|
|
is only available if DTD loading/validation has been requested.
|
|
For HTML documents parsed with the HTML
|
|
parser ID detection is done
|
|
automatically. In XML documents, all "xml:id"
|
|
attributes are considered to be of type ID.
|
|
You can test ID-ness of an attribute node
|
|
with $attr->isId().
|
|
</para>
|
|
<para>In versions 1.59 and earlier this method was
|
|
called getElementsById() (plural) by
|
|
mistake. Starting from 1.60 this name is
|
|
maintained as an alias only for backward compatibility.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>indexElements</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dom->indexElements();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function causes libxml2 to stamp all elements in a document with their document position index which considerably speeds up XPath
|
|
queries for large documents. It should only be used with static documents that won't be further changed by any DOM methods, because once
|
|
a document is indexed, XPath will always prefer the index to other methods of determining the document order of nodes. XPath could therefore
|
|
return improperly ordered node-lists when applied on a document that has been changed after being indexed. It is of course possible to use
|
|
this method to re-index a modified document before using it with XPath again. This function is not a part of the DOM specification.</para>
|
|
|
|
<para>This function returns number of elements indexed, -1 if error occurred, or -2 if this feature is not available in the running libxml2.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Node">
|
|
<title>Abstract Base Class of XML::LibXML Nodes</title>
|
|
|
|
<titleabbrev>XML::LibXML::Node</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>XML::LibXML::Node defines functions that are common to
|
|
all Node Types. An XML::LibXML::Node should never be created
|
|
standalone, but as an instance of a high level class such as
|
|
XML::LibXML::Element or XML::LibXML::Text. The class itself should
|
|
provide only common functionality. In XML::LibXML each node is
|
|
part either of a document or a document-fragment. Because of
|
|
this there is no node without a parent. This may causes
|
|
confusion with "unbound" nodes.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>nodeName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$name = $node->nodeName;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the node's name. This function is
|
|
aware of namespaces and returns the full name of
|
|
the current node (<function>prefix:localname</function>).
|
|
</para>
|
|
<para>Since 1.62 this function also returns the correct
|
|
DOM names for node types with constant names, namely:
|
|
#text, #cdata-section, #comment, #document,
|
|
#document-fragment.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setNodeName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setNodeName( $newName );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>In very limited situations, it is useful to change a nodes name. In the DOM specification this should throw an error. This Function is
|
|
aware of namespaces.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>isSameNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $node->isSameNode( $other_node );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns TRUE (1) if the given nodes refer to
|
|
the same node structure, otherwise FALSE (0) is
|
|
returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>isEqual</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $node->isEqual( $other_node );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>deprecated version of isSameNode().</para>
|
|
|
|
<para><emphasis>NOTE</emphasis> isEqual will change behaviour to follow the DOM specification</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>unique_key</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$num = $node->unique_key;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is not specified for any DOM level. It returns a key guaranteed to be unique for this node, and to always be the same value for this node. In other words, two node objects return the same key if and only if isSameNode indicates that they are the same node.</para>
|
|
|
|
<para>The returned key value is useful as a key in hashes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>nodeValue</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$content = $node->nodeValue;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If the node has any content (such as stored in a <function>text node</function>) it can get requested through this function.</para>
|
|
|
|
<para><emphasis>NOTE:</emphasis> Element Nodes have no content per definition. To get the text value of an Element use textContent()
|
|
instead!</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>textContent</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$content = $node->textContent;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>this function returns the content of all text nodes in the descendants of the given node as specified in DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>nodeType</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$type = $node->nodeType;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Return a numeric value representing the node type of this node.
|
|
The module XML::LibXML by default exports constants
|
|
for the node types (see the EXPORT section in the
|
|
<xref linkend="XML-LibXML"/> manual page).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>unbindNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->unbindNode();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Unbinds the Node from its siblings and Parent, but not from the Document it belongs to. If the node is not inserted into the DOM
|
|
afterwards, it will be lost after the program terminates. From a low level view, the unbound node is stripped from the context it is and
|
|
inserted into a (hidden) document-fragment.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>removeChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$childnode = $node->removeChild( $childnode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This will unbind the Child Node from its parent <function>$node</function>. The function returns the unbound node. If
|
|
<function>$childnode</function> is not a child of the given Node the function will fail.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>replaceChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$oldnode = $node->replaceChild( $newNode, $oldNode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Replaces the <function>$oldNode</function> with the <function>$newNode</function>. The <function>$oldNode</function> will be unbound
|
|
from the Node. This function differs from the DOM L2 specification, in the case, if the new node is not part of the document, the node will
|
|
be imported first.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>replaceNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->replaceNode($newNode);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is very similar to replaceChild(), but it replaces the node itself rather than a childnode. This is useful if a node
|
|
found by any XPath function, should be replaced.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>appendChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$childnode = $node->appendChild( $childnode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The function will add the <function>$childnode</function> to the end of <function>$node</function>'s children. The function should
|
|
fail, if the new childnode is already a child of <function>$node</function>. This function differs from the DOM L2 specification, in the
|
|
case, if the new node is not part of the document, the node will be imported first.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>addChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$childnode = $node->addChild( $childnode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>As an alternative to appendChild() one can use the addChild() function. This function is a bit faster, because it avoids all DOM
|
|
conformity checks. Therefore this function is quite useful if one builds XML documents in memory where the order and ownership (<function>ownerDocument</function>)
|
|
is assured.</para>
|
|
|
|
<para>addChild() uses libxml2's own xmlAddChild() function. Thus it has to be used with extra care: If a text node is added to a node
|
|
and the node itself or its last childnode is as well a text node, the node to add will be merged with the one already available. The current
|
|
node will be removed from memory after this action. Because perl is not aware of this action, the perl instance is still available.
|
|
XML::LibXML will catch the loss of a node and refuse to run any function called on that node.</para>
|
|
|
|
<programlisting> my $t1 = $doc->createTextNode( "foo" );
|
|
my $t2 = $doc->createTextNode( "bar" );
|
|
$t1->addChild( $t2 ); # is OK
|
|
my $val = $t2->nodeValue(); # will fail, script dies</programlisting>
|
|
|
|
<para>Also addChild() will not check if the added node belongs to the same document as the node it will be added to. This could lead to
|
|
inconsistent documents and in more worse cases even to memory violations, if one does not keep track of this issue.</para>
|
|
|
|
<para>Although this sounds like a lot of trouble, addChild() is useful if a document is built from a stream, such as happens sometimes in
|
|
SAX handlers or filters.</para>
|
|
|
|
<para>If you are not sure about the source of your nodes, you better stay with appendChild(), because this function is more user friendly in
|
|
the sense of being more error tolerant.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>addNewChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node = $parent->addNewChild( $nsURI, $name );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Similar to <function>addChild()</function>, this function uses low level libxml2 functionality to provide faster interface for DOM
|
|
building. <emphasis>addNewChild()</emphasis> uses <function>xmlNewChild()</function> to create a new node on a given parent element.</para>
|
|
|
|
<para>addNewChild() has two parameters $nsURI and $name, where $nsURI is an (optional) namespace URI. $name is the fully qualified element
|
|
name; addNewChild() will determine the correct prefix if necessary.</para>
|
|
|
|
<para>The function returns the newly created node.</para>
|
|
|
|
<para>This function is very useful for DOM building, where a created node can be directly associated with its parent. <emphasis>NOTE</emphasis>
|
|
this function is not part of the DOM specification and its use will limit your code to XML::LibXML.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>addSibling</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->addSibling($newNode);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>addSibling() allows adding an additional node to the end of a nodelist, defined by the given node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>cloneNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$newnode =$node->cloneNode( $deep );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>cloneNode</emphasis> creates a
|
|
copy of <function>$node</function>. When $deep is
|
|
set to 1 (true) the function will copy all
|
|
child nodes as well. If $deep is 0 only the current
|
|
node will be copied. Note that in case of element,
|
|
attributes are copied even if $deep is 0.
|
|
</para>
|
|
<para>Note that the behavior of this function for $deep=0
|
|
has changed in 1.62 in order to be consistent with the DOM spec
|
|
(in older versions attributes and namespace information
|
|
was not copied for elements).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>parentNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$parentnode = $node->parentNode;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns simply the Parent Node of the current node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>nextSibling</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$nextnode = $node->nextSibling();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the next sibling if any .</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>nextNonBlankSibling</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$nextnode = $node->nextNonBlankSibling();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the next non-blank sibling if any (a node is blank if it is a Text or CDATA node consisting of whitespace only). This method is not defined by DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>previousSibling</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$prevnode = $node->previousSibling();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Analogous to <emphasis>getNextSibling</emphasis> the function returns the previous sibling if any.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>previousNonBlankSibling</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$prevnode = $node->previousNonBlankSibling();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the previous non-blank sibling if any (a node is blank if it is a Text or CDATA node consisting of whitespace only). This method is not defined by DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>hasChildNodes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$boolean = $node->hasChildNodes();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If the current node has child nodes this function returns TRUE (1), otherwise it returns FALSE (0, not undef).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>firstChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$childnode = $node->firstChild;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a node has child nodes this function will return the first node in the child list.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>lastChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$childnode = $node->lastChild;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If the <function>$node</function> has child nodes this function returns the last child node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>ownerDocument</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$documentnode = $node->ownerDocument;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Through this function it is always possible to access the document the current node is bound to.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getOwner</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node = $node->getOwner;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function returns the node the current node is associated with. In most cases this will be a document node or a document fragment
|
|
node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>setOwnerDocument</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setOwnerDocument( $doc );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function binds a node to another DOM. This method unbinds the node first, if it is already bound to another document.</para>
|
|
|
|
<para>This function is the opposite calling of <xref linkend="XML-LibXML-Document"/>'s adoptNode() function. Because of this it has the same limitations
|
|
with Entity References as adoptNode().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>insertBefore</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->insertBefore( $newNode, $refNode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The method inserts <function>$newNode</function> before <function>$refNode</function>. If <function>$refNode</function> is undefined,
|
|
the newNode will be set as the new last child of the parent node. This function differs from the DOM L2 specification, in the case, if the
|
|
new node is not part of the document, the node will be imported first, automatically.</para>
|
|
|
|
<para>$refNode has to be passed to the function even if it is undefined:</para>
|
|
|
|
<programlisting> $node->insertBefore( $newNode, undef ); # the same as $node->appendChild( $newNode );
|
|
$node->insertBefore( $newNode ); # wrong</programlisting>
|
|
|
|
<para>Note, that the reference node has to be a direct child of the node the function is called on. Also, $newChild is not allowed to be an
|
|
ancestor of the new parent node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>insertAfter</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->insertAfter( $newNode, $refNode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The method inserts <function>$newNode</function> after <function>$refNode</function>. If <function>$refNode</function> is undefined,
|
|
the newNode will be set as the new last child of the parent node.</para>
|
|
|
|
<para>Note, that $refNode has to be passed explicitly even if it is undef.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>findnodes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->findnodes( $xpath_expression );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>findnodes</emphasis> evaluates the xpath expression (XPath 1.0) on the current node and returns the resulting node set as an array. In scalar context, returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
<para>The xpath expression can be passed either as a string, or
|
|
as a <olink targetdoc="XML::LibXML::XPathExpression">XML::LibXML::XPathExpression</olink> object.
|
|
</para>
|
|
<para><emphasis>NOTE ON NAMESPACES AND XPATH</emphasis>:</para>
|
|
<para>A common mistake about
|
|
XPath is to assume that node tests consisting of an
|
|
element name with no prefix match elements in the default
|
|
namespace. This assumption is wrong - by XPath
|
|
specification, such node tests can only match elements
|
|
that are in no (i.e. null) namespace.
|
|
</para>
|
|
<para>
|
|
So, for example, one cannot match the root element of an
|
|
XHTML document with <literal>$node->find('/html')</literal>
|
|
since <literal>'/html'</literal> would only match if the
|
|
root element <literal><html></literal> had no
|
|
namespace, but all XHTML elements belong to the namespace
|
|
http://www.w3.org/1999/xhtml. (Note that
|
|
<literal>xmlns="..."</literal> namespace declarations can
|
|
also be specified in a DTD, which makes the situation even worse, since
|
|
the XML document looks as if there was no default namespace).
|
|
</para>
|
|
<para>There are several possible ways to deal with namespaces in XPath:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The recommended way is to use the
|
|
<xref linkend="XML-LibXML-XPathContext"/> module
|
|
to define an explicit context
|
|
for XPath evaluation, in which a document independent
|
|
prefix-to-namespace mapping can be defined. For
|
|
example:
|
|
</para>
|
|
<programlisting>my $xpc = XML::LibXML::XPathContext->new;
|
|
$xpc->registerNs('x', 'http://www.w3.org/1999/xhtml');
|
|
$xpc->find('/x:html',$node);</programlisting>
|
|
</listitem>
|
|
<listitem><para>
|
|
Another possibility is to use prefixes declared
|
|
in the queried document (if known).
|
|
If the document declares a prefix for the
|
|
namespace in question (and the context node is in the
|
|
scope of the declaration),
|
|
<function>XML::LibXML</function> allows you to use the
|
|
prefix in the XPath expression, e.g.:
|
|
</para>
|
|
<programlisting>$node->find('/x:html');</programlisting>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>See also XML::LibXML::XPathContext->findnodes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>find</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$result = $node->find( $xpath );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>find</emphasis> evaluates the XPath 1.0 expression using the current node as the context of the expression, and returns the
|
|
result depending on what type of result the XPath expression had. For example, the XPath "1 * 3 + 52" results in a
|
|
<olink targetdoc="XML::LibXML::Number">XML::LibXML::Number</olink> object being returned. Other expressions might return an <olink targetdoc="XML::LibXML::Boolean">XML::LibXML::Boolean</olink>
|
|
object, or an <olink targetdoc="XML::LibXML::Literal">XML::LibXML::Literal</olink> object (a string). Each of those objects uses Perl's overload feature to "do
|
|
the right thing" in different contexts.</para>
|
|
<para>The xpath expression can be passed either as a string,
|
|
or as a <olink targetdoc="XML::LibXML::XPathExpression">XML::LibXML::XPathExpression</olink> object.
|
|
</para>
|
|
<para>See also <xref linkend="XML-LibXML-XPathContext"/>->find.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>findvalue</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $node->findvalue( $xpath );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>findvalue</emphasis> is exactly equivalent to:</para>
|
|
|
|
<programlisting> $node->find( $xpath )->to_literal; </programlisting>
|
|
|
|
<para>That is, it returns the literal value of the results. This enables you to ensure that you get a string back from your search, allowing
|
|
certain shortcuts. This could be used as the equivalent of XSLT's <xsl:value-of select="some_xpath"/>.</para>
|
|
<para>See also <xref linkend="XML-LibXML-XPathContext"/>->findvalue.</para>
|
|
<para>The xpath expression can be passed either as a string, or
|
|
as a <olink targetdoc="XML::LibXML::XPathExpression">XML::LibXML::XPathExpression</olink> object.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>exists</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $node->exists( $xpath_expression );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This method behaves like <emphasis>findnodes</emphasis>, except
|
|
that it only returns a boolean value (1 if the expression matches a node, 0 otherwise)
|
|
and may be faster than <emphasis>findnodes</emphasis>, because
|
|
the XPath evaluation may stop early on the first match (this is true for libxml2 >= 2.6.27).
|
|
</para><para>For XPath expressions that do not return node-set,
|
|
the method returns true if the returned value is a non-zero number or a non-empty string.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>childNodes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@childnodes = $node->childNodes();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para><emphasis>childNodes</emphasis> implements a more intuitive interface to the childnodes of the current node. It enables you to pass
|
|
all children directly to a <function>map</function> or <function>grep</function>. If this function is called in scalar context, a
|
|
<olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object will be returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>nonBlankChildNodes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@childnodes = $node->nonBlankChildNodes();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This is like <emphasis>childNodes</emphasis>,
|
|
but returns only non-blank nodes
|
|
(where a node is blank if it is a Text or CDATA node consisting of whitespace only). This method is not defined by DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toString</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$xmlstring = $node->toString($format,$docencoding);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method is similar to the method <function>toString</function> of a <xref linkend="XML-LibXML-Document"/> but for a single node. It returns a string consisting of XML serialization of the given node and all its descendants. Unlike <function>XML::LibXML::Document::toString</function>, in this case the resulting string is by default a character string (UTF-8 encoded with UTF8 flag on). An optional flag $format controls indentation, as in <function>XML::LibXML::Document::toString</function>. If the second optional $docencoding flag is true, the result will be a byte string in the document encoding (see <function>XML::LibXML::Document::actualEncoding</function>).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>toStringC14N</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$c14nstring = $node->toStringC14N();
|
|
$c14nstring = $node->toStringC14N($with_comments, $xpath_expression , $xpath_context);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The function is similar to
|
|
toString(). Instead of simply serializing the
|
|
document tree, it transforms it as it is specified
|
|
in the XML-C14N Specification
|
|
(see <ulink url="http://www.w3.org/TR/xml-c14n">http://www.w3.org/TR/xml-c14n</ulink>).
|
|
Such transformation is known as
|
|
canonization.</para>
|
|
|
|
<para>If $with_comments is 0 or not defined, the
|
|
result-document will not contain any comments that
|
|
exist in the original document. To include
|
|
comments into the canonized document,
|
|
$with_comments has to be set to 1.</para>
|
|
|
|
<para>The parameter $xpath_expression defines the
|
|
nodeset of nodes that should be visible in the
|
|
resulting document. This can be used to filter out
|
|
some nodes. One has to note, that only the nodes
|
|
that are part of the nodeset, will be included
|
|
into the result-document. Their child-nodes will
|
|
not exist in the resulting document, unless they
|
|
are part of the nodeset defined by the xpath
|
|
expression.
|
|
</para>
|
|
<para>If $xpath_expression is omitted or empty,
|
|
toStringC14N() will include all nodes in the given
|
|
sub-tree, using the following XPath expressions:
|
|
with comments
|
|
<programlisting>(. | .//node() | .//@* | .//namespace::*)</programlisting>
|
|
and without comments
|
|
<programlisting>(. | .//node() | .//@* | .//namespace::*)[not(self::comment())]</programlisting>
|
|
</para>
|
|
<para>
|
|
An optional parameter $xpath_context can be used
|
|
to pass an <xref linkend="XML-LibXML-XPathContext"/> object defining
|
|
the context for evaluation of $xpath_expression.
|
|
This is useful for mapping namespace prefixes used in the XPath expression
|
|
to namespace URIs.
|
|
Note, however, that
|
|
$node will be used as the context node for the evaluation, not
|
|
the context node of $xpath_context!
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>toStringC14N_v1_1</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$c14nstring = $node->toStringC14N_v1_1();
|
|
$c14nstring = $node->toStringC14N_v1_1($with_comments, $xpath_expression , $xpath_context);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>
|
|
This function behaves like toStringC14N() except that
|
|
it uses the "XML_C14N_1_1" constant for
|
|
canonicalising using the "C14N 1.1 spec".
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>toStringEC14N</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$ec14nstring = $node->toStringEC14N();
|
|
$ec14nstring = $node->toStringEC14N($with_comments, $xpath_expression, $inclusive_prefix_list);
|
|
$ec14nstring = $node->toStringEC14N($with_comments, $xpath_expression, $xpath_context, $inclusive_prefix_list);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The function is similar to toStringC14N() but follows
|
|
the XML-EXC-C14N Specification (see <ulink url="http://www.w3.org/TR/xml-exc-c14n">http://www.w3.org/TR/xml-exc-c14n</ulink>)
|
|
for exclusive canonization of XML.</para>
|
|
|
|
<para>The arguments $with_comments, $xpath_expression, $xpath_context are as in toStringC14N().
|
|
An ARRAY reference can be passed as the last argument $inclusive_prefix_list,
|
|
listing namespace prefixes that are to be handled in the manner described by the Canonical XML Recommendation (i.e. preserved in the output even if the namespace is not used). C.f. the spec for details.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>serialize</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$str = $doc->serialize($format); </funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>An alias for toString(). This function was name added to be more consistent
|
|
with libxml2.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>serialize_c14n</term>
|
|
|
|
<listitem>
|
|
<para>An alias for toStringC14N().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>serialize_exc_c14n</term>
|
|
|
|
<listitem>
|
|
<para>An alias for toStringEC14N().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>localname</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$localname = $node->localname;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the local name of a tag. This is the part behind the colon.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>prefix</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$nameprefix = $node->prefix;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the prefix of a tag. This is the part before the colon.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>namespaceURI</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$uri = $node->namespaceURI();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns the URI of the current namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>hasAttributes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$boolean = $node->hasAttributes();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns 1 (TRUE) if the current node has any attributes set, otherwise 0 (FALSE) is returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>attributes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@attributelist = $node->attributes();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function returns all attributes and namespace declarations assigned to the given node.</para>
|
|
|
|
<para>Because XML::LibXML does not implement namespace declarations and attributes the same way, it is required to test what kind of node is
|
|
handled while accessing the functions result.</para>
|
|
|
|
<para>If this function is called in array context the attribute nodes are returned as an array. In scalar context, the function will return a
|
|
<olink targetdoc="XML::LibXML::NamedNodeMap">XML::LibXML::NamedNodeMap</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>lookupNamespaceURI</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$URI = $node->lookupNamespaceURI( $prefix );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Find a namespace URI by its prefix starting at the current node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>lookupNamespacePrefix</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$prefix = $node->lookupNamespacePrefix( $URI );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Find a namespace prefix by its URI starting at the current node.</para>
|
|
|
|
<para><emphasis>NOTE</emphasis> Only the namespace URIs are meant to be unique. The prefix is only document related. Also the document might
|
|
have more than a single prefix defined for a namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>normalize</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->normalize;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function normalizes adjacent text nodes. This function is not as strict as libxml2's xmlTextMerge() function, since it will
|
|
not free a node that is still referenced by the perl layer.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getNamespaces</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nslist = $node->getNamespaces;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If a node has any namespaces defined, this function will return these namespaces. Note, that this will not return all namespaces that
|
|
are in scope, but only the ones declared explicitly for that node.</para>
|
|
|
|
<para>Although getNamespaces is available for all nodes, it only makes sense if used with element nodes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>removeChildNodes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->removeChildNodes();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is not specified for any DOM level: It removes all childnodes from a node in a single step. Other than the libxml2
|
|
function itself (xmlFreeNodeList), this function will not immediately remove the nodes from the memory. This saves one from getting memory
|
|
violations, if there are nodes still referred to from the Perl level.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>baseURI ()</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$strURI = $node->baseURI();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>
|
|
Searches for the base URL of the node. The method should work on both XML
|
|
and HTML documents even if base mechanisms for these are completely different.
|
|
It returns the base as defined in RFC 2396 sections
|
|
"5.1.1. Base URI within Document Content"
|
|
and
|
|
"5.1.2. Base URI from the Encapsulating Entity".
|
|
However it does not return the document base (5.1.3), use method <function>URI</function>
|
|
of <function>XML::LibXML::Document</function> for this.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setBaseURI ($strURI)</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setBaseURI($strURI);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This method only does something useful for an element node
|
|
in an XML document.
|
|
It sets the xml:base attribute on the node to $strURI, which
|
|
effectively sets the base URI of the node to the same value.
|
|
</para>
|
|
<para>
|
|
Note: For HTML documents this behaves as if the document was XML
|
|
which may not be desired, since it does not effectively
|
|
set the base URI of the node. See RFC 2396 appendix D
|
|
for an example of how base URI can be specified in HTML.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>nodePath</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->nodePath();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is not specified for any DOM level: It returns a canonical structure based XPath for a given node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>line_number</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$lineno = $node->line_number();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function returns the line number where the tag was found during parsing. If a node is added to the document the line number is 0.
|
|
Problems may occur, if a node from one document is passed to another one.</para>
|
|
<para>IMPORTANT:
|
|
Due to limitations in the libxml2 library line numbers greater than
|
|
65535 will be returned as 65535. Please see
|
|
<ulink url="http://bugzilla.gnome.org/show_bug.cgi?id=325533">http://bugzilla.gnome.org/show_bug.cgi?id=325533</ulink> for more details.
|
|
</para>
|
|
<para>Note: line_number() is special to XML::LibXML and not part of the DOM specification.</para>
|
|
|
|
<para>If the line_numbers flag of the parser was not activated before parsing, line_number() will always return 0.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Element">
|
|
<title>XML::LibXML Class for Element Nodes</title>
|
|
|
|
<titleabbrev>XML::LibXML::Element</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Element nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
The class inherits from <xref linkend="XML-LibXML-Node"/>.
|
|
The documentation for Inherited methods is not listed here.
|
|
</para>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node = XML::LibXML::Element->new( $name );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function creates a new node unbound to any DOM.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>setAttribute</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setAttribute( $aname, $avalue );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method sets or replaces the node's attribute <function>$aname</function> to the value <function>$avalue</function></para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>setAttributeNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setAttributeNS( $nsURI, $aname, $avalue );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Namespace-aware version of <function>setAttribute</function>, where
|
|
<function>$nsURI</function> is a namespace URI,
|
|
<function>$aname</function> is a qualified name,
|
|
and <function>$avalue</function> is the value.
|
|
The namespace URI may be null (empty or undefined)
|
|
in order to create an attribute which has no namespace.
|
|
</para>
|
|
<para>
|
|
The current implementation differs from DOM in the following aspects
|
|
</para>
|
|
<para>
|
|
If an attribute with the same local name and namespace URI already exists
|
|
on the element, but its prefix differs from the prefix of <function>$aname</function>,
|
|
then this function is supposed to change the prefix (regardless
|
|
of namespace declarations and possible collisions).
|
|
However, the current implementation does rather the opposite.
|
|
If a prefix is declared for the namespace URI in the scope
|
|
of the attribute, then the already declared prefix is used,
|
|
disregarding the prefix specified in <function>$aname</function>.
|
|
If no prefix is declared for the namespace, the function tries
|
|
to declare the prefix specified in <function>$aname</function>
|
|
and dies if the prefix is already taken by some other namespace.
|
|
</para>
|
|
<para>According to DOM Level 2 specification, this method can also be used to
|
|
create or modify special attributes used for declaring XML namespaces
|
|
(which belong to the namespace "http://www.w3.org/2000/xmlns/" and
|
|
have prefix or name "xmlns"). This should work since version 1.61,
|
|
but again the implementation differs from DOM specification in the following:
|
|
if a declaration of the same namespace prefix already exists
|
|
on the element, then changing its value via this method
|
|
automatically changes the namespace of all elements and attributes
|
|
in its scope. This is because in libxml2 the namespace URI of an element
|
|
is not static but is computed from a pointer to a namespace declaration attribute.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getAttribute</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$avalue = $node->getAttribute( $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>If <function>$node</function> has an attribute with the name <function>$aname</function>, the value of this attribute will get
|
|
returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getAttributeNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$avalue = $node->getAttributeNS( $nsURI, $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Retrieves an attribute value by local name and namespace URI.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getAttributeNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attrnode = $node->getAttributeNode( $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Retrieve an attribute node by name. If no attribute with a given name exists, <function>undef</function> is returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getAttributeNodeNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attrnode = $node->getAttributeNodeNS( $namespaceURI, $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Retrieves an attribute node by local name and namespace URI. If no attribute with a given localname and namespace exists, <function>undef</function> is returned.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>removeAttribute</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->removeAttribute( $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The method removes the attribute <function>$aname</function> from the node's attribute list, if the attribute can be found.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>removeAttributeNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->removeAttributeNS( $nsURI, $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Namespace version of <function>removeAttribute</function></para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>hasAttribute</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$boolean = $node->hasAttribute( $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function tests if the named attribute is set for the node. If the attribute is specified, TRUE (1) will be returned, otherwise the
|
|
return value is FALSE (0).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>hasAttributeNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$boolean = $node->hasAttributeNS( $nsURI, $aname );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>namespace version of <function>hasAttribute</function></para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getChildrenByTagName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->getChildrenByTagName($tagname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The function gives direct access to all child elements of the current node with a given tagname, where
|
|
tagname is a qualified name, that is, in case of namespace usage it may consist of a prefix and local
|
|
name. This function makes things a lot easier if one needs
|
|
to handle big data sets. A special tagname '*' can be used to match any name.</para>
|
|
|
|
<para>If this function is called in SCALAR context, it returns the number of elements found.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getChildrenByTagNameNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->getChildrenByTagNameNS($nsURI,$tagname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Namespace version of <function>getChildrenByTagName</function>. A special nsURI '*' matches any namespace URI,
|
|
in which case the function behaves just like <function>getChildrenByLocalName</function>.</para>
|
|
|
|
<para>If this function is called in SCALAR context, it returns the number of elements found.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getChildrenByLocalName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->getChildrenByLocalName($localname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The function gives direct access to all child elements of the current node with a given local name. It makes things a lot easier if one needs
|
|
to handle big data sets. A special <function>localname</function> '*' can be used to match any local name.</para>
|
|
|
|
<para>If this function is called in SCALAR context, it returns the number of elements found.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getElementsByTagName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->getElementsByTagName($tagname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is part of the spec. It
|
|
fetches all descendants of a node with a given tagname,
|
|
where <function>tagname</function> is a qualified name,
|
|
that is, in case of namespace usage it may consist of a prefix and
|
|
local name.
|
|
A special <function>tagname</function> '*' can be used to match any tag name.
|
|
</para>
|
|
|
|
<para>In SCALAR context this function returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getElementsByTagNameNS</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->getElementsByTagNameNS($nsURI,$localname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Namespace version of <function>getElementsByTagName</function> as found in the DOM spec.
|
|
A special <function>localname</function> '*' can be used to match any local name
|
|
and <function>nsURI</function> '*' can be used to match any namespace URI.</para>
|
|
|
|
<para>In SCALAR context this function returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getElementsByLocalName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>@nodes = $node->getElementsByLocalName($localname);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is not found in the DOM specification. It is a mix of getElementsByTagName and getElementsByTagNameNS. It will fetch all
|
|
tags matching the given local-name. This allows one to select tags with the same local name across namespace borders.</para>
|
|
|
|
<para>In SCALAR context this function returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>appendWellBalancedChunk</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->appendWellBalancedChunk( $chunk );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Sometimes it is necessary to append a string coded XML Tree to a node. <emphasis>appendWellBalancedChunk</emphasis> will do the trick
|
|
for you. But this is only done if the String is <function>well-balanced</function>.</para>
|
|
|
|
<para><emphasis>Note that appendWellBalancedChunk() is only left for compatibility reasons</emphasis>. Implicitly it uses</para>
|
|
|
|
<programlisting> my $fragment = $parser->parse_balanced_chunk( $chunk );
|
|
$node->appendChild( $fragment );</programlisting>
|
|
|
|
<para>This form is more explicit and makes it easier to control the flow of a script.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>appendText</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->appendText( $PCDATA );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>alias for appendTextNode().</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>appendTextNode</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->appendTextNode( $PCDATA );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This wrapper function lets you add a string directly to an element node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>appendTextChild</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->appendTextChild( $childname , $PCDATA );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Somewhat similar with <function>appendTextNode</function>: It lets you set an Element, that contains only a <function>text node</function>
|
|
directly by specifying the name and the text content.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setNamespace</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setNamespace( $nsURI , $nsPrefix, $activate );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>setNamespace() allows one to apply a
|
|
namespace to an element. The function takes three
|
|
parameters: 1. the namespace URI, which is
|
|
required and the two optional values prefix, which
|
|
is the namespace prefix, as it should be used in
|
|
child elements or attributes as well as the
|
|
additional activate parameter. If prefix is not given,
|
|
undefined or empty, this function tries to create a
|
|
declaration of the default namespace.
|
|
</para>
|
|
<para>The activate parameter is most useful: If
|
|
this parameter is set to FALSE (0), a new namespace
|
|
declaration is simply added to the element
|
|
while the element's namespace itself is not
|
|
altered. Nevertheless, activate is set to TRUE (1)
|
|
on default. In this case the namespace
|
|
is used as the node's effective
|
|
namespace. This means the namespace prefix is
|
|
added to the node name and if there was a
|
|
namespace already active for the node, it will
|
|
be replaced (but its declaration is not removed from the document).
|
|
A new namespace declaration is only created if necessary
|
|
(that is, if the element is already in the scope
|
|
of a namespace declaration associating the prefix
|
|
with the namespace URI, then this declaration is reused).
|
|
</para>
|
|
|
|
<para>The following example may clarify this:</para>
|
|
|
|
<programlisting> my $e1 = $doc->createElement("bar");
|
|
$e1->setNamespace("http://foobar.org", "foo")</programlisting>
|
|
|
|
<para>results</para>
|
|
|
|
<programlisting> <foo:bar xmlns:foo="http://foobar.org"/></programlisting>
|
|
|
|
<para>while</para>
|
|
|
|
<programlisting> my $e2 = $doc->createElement("bar");
|
|
$e2->setNamespace("http://foobar.org", "foo",0)</programlisting>
|
|
|
|
<para>results only</para>
|
|
|
|
<programlisting> <bar xmlns:foo="http://foobar.org"/></programlisting>
|
|
|
|
<para>By using $activate == 0 it is possible to
|
|
create multiple namespace declarations on a single
|
|
element.</para>
|
|
<para>The function fails if it is required to
|
|
create a declaration associating the prefix
|
|
with the namespace URI but the element already
|
|
carries a declaration with the same prefix but
|
|
different namespace URI.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setNamespaceDeclURI</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setNamespaceDeclURI( $nsPrefix, $newURI );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>EXPERIMENTAL IN 1.61 !</para>
|
|
<para>This function manipulates
|
|
directly with an existing namespace
|
|
declaration on an element. It takes
|
|
two parameters: the prefix by which it
|
|
looks up the namespace declaration and
|
|
a new namespace URI which replaces its previous
|
|
value.</para>
|
|
<para>It returns 1 if the namespace declaration
|
|
was found and changed, 0 otherwise.</para>
|
|
<para>All elements and attributes (even those previously
|
|
unbound from the document) for which the
|
|
namespace declaration determines their namespace
|
|
belong to the new namespace after
|
|
the change.
|
|
</para>
|
|
<para>If the new URI is undef or empty, the nodes
|
|
have no namespace and no prefix after the change.
|
|
Namespace declarations
|
|
once nulled in this way do not
|
|
further appear in the serialized output
|
|
(but do remain in the document for internal integrity
|
|
of libxml2 data structures).
|
|
</para>
|
|
<para>This function is NOT part of any DOM API.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setNamespaceDeclPrefix</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node->setNamespaceDeclPrefix( $oldPrefix, $newPrefix );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>EXPERIMENTAL IN 1.61 !</para>
|
|
<para>This function manipulates
|
|
directly with an existing namespace
|
|
declaration on an element. It takes
|
|
two parameters: the old prefix by which it
|
|
looks up the namespace declaration and
|
|
a new prefix which is to replace the old one.</para>
|
|
<para>The function dies with an error
|
|
if the element is in the scope of
|
|
another declaration whose prefix equals
|
|
to the new prefix, or if the change should
|
|
result in a declaration with a non-empty prefix but
|
|
empty namespace URI.
|
|
Otherwise, it returns 1 if the namespace declaration
|
|
was found and changed and 0 if not found.</para>
|
|
<para>All elements and attributes (even those previously
|
|
unbound from the document) for which the
|
|
namespace declaration determines their namespace
|
|
change their prefix to the new value.
|
|
</para>
|
|
<para>If the new prefix is undef or empty,
|
|
the namespace declaration becomes
|
|
a declaration of a default namespace.
|
|
The corresponding nodes drop their namespace prefix
|
|
(but remain in the, now default, namespace).
|
|
In this case the function fails, if the containing element
|
|
is in the scope of another default namespace declaration.
|
|
</para>
|
|
<para>This function is NOT part of any DOM API.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Overloading</title>
|
|
<para>XML::LibXML::Element overloads hash dereferencing to
|
|
provide access to the element's attributes. For non-namespaced
|
|
attributes, the attribute name is the hash key, and the attribute
|
|
value is the hash value. For namespaced attributes, the hash key
|
|
is qualified with the namespace URI, using Clark notation.</para>
|
|
<para>Perl's "tied hash" feature is used, which means that the
|
|
hash gives you read-write access to the element's attributes.
|
|
For more information, see <olink targetdoc="XML::LibXML::AttributeHash"
|
|
>XML::LibXML::AttributeHash</olink></para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Text">
|
|
<title>XML::LibXML Class for Text Nodes</title>
|
|
|
|
<titleabbrev>XML::LibXML::Text</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Text nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>Unlike the DOM specification, XML::LibXML implements the text node as the base class of all character data node. Therefore there exists no
|
|
CharacterData class. This allows one to apply methods of text nodes also to Comments and CDATA-sections.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
The class inherits from <xref linkend="XML-LibXML-Node"/>.
|
|
The documentation for Inherited methods is not listed here.
|
|
</para>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text = XML::LibXML::Text->new( $content ); </funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The constructor of the class. It creates an unbound text node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>data</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$nodedata = $text->data;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Although there exists the <function>nodeValue</function> attribute in the Node class, the DOM specification defines data as a separate
|
|
attribute. <function>XML::LibXML</function> implements these two attributes not as different attributes, but as aliases, such as
|
|
<function>libxml2</function> does. Therefore</para>
|
|
|
|
<programlisting> $text->data;</programlisting>
|
|
|
|
<para>and</para>
|
|
|
|
<programlisting> $text->nodeValue;</programlisting>
|
|
|
|
<para>will have the same result and are not different entities.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>setData($string)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->setData( $text_content );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function sets or replaces text content to a node. The node has to be of the type "text", "cdata" or
|
|
"comment".</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>substringData($offset,$length)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->substringData($offset, $length);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Extracts a range of data from the node. (DOM Spec) This function takes the two parameters $offset and $length and returns the
|
|
sub-string, if available.</para>
|
|
|
|
<para>If the node contains no data or $offset refers to an non-existing string index, this function will return <emphasis>undef</emphasis>.
|
|
If $length is out of range <function>substringData</function> will return the data starting at $offset instead of causing an error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>appendData($string)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->appendData( $somedata );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Appends a string to the end of the existing data. If the current text node contains no data, this function has the same effect as
|
|
<function>setData</function>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>insertData($offset,$string)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->insertData($offset, $string);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Inserts the parameter $string at the given $offset of the existing data of the node. This operation will not remove existing data, but
|
|
change the order of the existing data.</para>
|
|
|
|
<para>The $offset has to be a positive value. If $offset is out of range, <function>insertData</function> will have the same behaviour as
|
|
<function>appendData</function>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>deleteData($offset, $length)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->deleteData($offset, $length);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method removes a chunk from the existing node data at the given offset. The $length parameter tells, how many characters should
|
|
be removed from the string.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>deleteDataString($string, [$all])</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->deleteDataString($remstring, $all);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method removes a chunk from the existing node data. Since the DOM spec is quite unhandy if you already know <function>which</function>
|
|
string to remove from a text node, this method allows more perlish code :)</para>
|
|
|
|
<para>The functions takes two parameters: <emphasis>$string</emphasis> and optional the <emphasis>$all</emphasis> flag. If $all is not set,
|
|
<emphasis>undef</emphasis> or <emphasis>0</emphasis>, <function>deleteDataString</function> will remove only the first occurrence of
|
|
$string. If $all is <emphasis>TRUE</emphasis> <function>deleteDataString</function> will remove all occurrences of <emphasis>$string</emphasis>
|
|
from the node data.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>replaceData($offset, $length, $string)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->replaceData($offset, $length, $string);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The DOM style version to replace node data.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>replaceDataString($oldstring, $newstring, [$all])</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->replaceDataString($old, $new, $flag);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The more programmer friendly version of replaceData() :)</para>
|
|
|
|
<para>Instead of giving offsets and length one can specify
|
|
the exact string (<emphasis>$oldstring</emphasis>) to
|
|
be replaced. Additionally the <emphasis>$all</emphasis>
|
|
flag allows one to replace all occurrences of
|
|
<emphasis>$oldstring</emphasis>.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>replaceDataRegEx( $search_cond, $replace_cond, $reflags )</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$text->replaceDataRegEx( $search_cond, $replace_cond, $reflags );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method replaces the node's data by a
|
|
<function>simple</function> regular expression.
|
|
Optional, this function allows one to pass some flags
|
|
that will be added as flag to the replace
|
|
statement.</para>
|
|
|
|
<para><emphasis>NOTE:</emphasis> This is a shortcut for</para>
|
|
|
|
<programlisting> my $datastr = $node->getData();
|
|
$datastr =~ s/somecond/replacement/g; # 'g' is just an example for any flag
|
|
$node->setData( $datastr );</programlisting>
|
|
|
|
<para>This function can make things easier to read for simple replacements. For more complex variants it is recommended to use the code
|
|
snippet above.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Comment">
|
|
<title>XML::LibXML Comment Class</title>
|
|
|
|
<titleabbrev>XML::LibXML::Comment</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Comment nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>This class provides all functions of <xref linkend="XML-LibXML-Text"/>, but for comment nodes. This can be done, since only the output of the
|
|
node types is different, but not the data structure. :-)</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
The class inherits from <xref linkend="XML-LibXML-Node"/>.
|
|
The documentation for Inherited methods is not listed here.
|
|
</para>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node = XML::LibXML::Comment->new( $content );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The constructor is the only provided function for this package. It is required, because <emphasis>libxml2</emphasis> treats text nodes
|
|
and comment nodes slightly differently.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-CDATASection">
|
|
<title>XML::LibXML Class for CDATA Sections</title>
|
|
|
|
<titleabbrev>XML::LibXML::CDATASection</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to CDATA nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>This class provides all functions of <xref linkend="XML-LibXML-Text"/>, but for CDATA nodes.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
The class inherits from <xref linkend="XML-LibXML-Node"/>.
|
|
The documentation for Inherited methods is not listed here.
|
|
</para>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node = XML::LibXML::CDATASection->new( $content );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The constructor is the only provided function for this package. It is required, because <emphasis>libxml2</emphasis> treats the
|
|
different text node types slightly differently.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Attr">
|
|
<title>XML::LibXML Attribute Class</title>
|
|
|
|
<titleabbrev>XML::LibXML::Attr</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Attribute nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>This is the interface to handle Attributes like ordinary nodes. The naming of the class relies on the W3C DOM documentation.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<para>
|
|
The class inherits from <xref linkend="XML-LibXML-Node"/>.
|
|
The documentation for Inherited methods is not listed here.
|
|
</para>
|
|
<para>
|
|
Many functions listed here are
|
|
extensively documented in the <ulink url="http://www.w3.org/TR/DOM-Level-3-Core/">DOM Level 3 specification</ulink>. Please refer to
|
|
the specification for extensive documentation.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attr = XML::LibXML::Attr->new($name [,$value]);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Class constructor. If you need to work with ISO encoded strings, you should <emphasis>always</emphasis> use the <function>createAttribute</function>
|
|
of <xref linkend="XML-LibXML-Document"/>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getValue</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$string = $attr->getValue();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the value stored for the attribute. If undef is returned, the attribute has no value, which is different of being
|
|
<function>not specified</function>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>value</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$string = $attr->value;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Alias for <emphasis>getValue()</emphasis></para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>setValue</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attr->setValue( $string );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This is needed to set a new attribute value. If ISO encoded strings are passed as parameter, the node has to be bound to a document,
|
|
otherwise the encoding might be done incorrectly.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>getOwnerElement</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$node = $attr->getOwnerElement();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>returns the node the attribute belongs to. If the attribute is not bound to a node, undef will be returned. Overwriting the underlying
|
|
implementation, the <emphasis>parentNode</emphasis> function will return undef, instead of the owner element.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
<term>setNamespace</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$attr->setNamespace($nsURI, $prefix);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function tries to bound the attribute to a given namespace.
|
|
If <function>$nsURI</function> is undefined or empty,
|
|
the function discards any previous association of the attribute with a namespace.
|
|
If the namespace was not previously declared in the context of the
|
|
attribute, this function will fail.
|
|
In this case you may wish to call setNamespace() on the ownerElement.
|
|
If the namespace URI is non-empty and
|
|
declared in the context of the attribute, but only with a different
|
|
(non-empty) prefix, then the attribute is still bound to the namespace
|
|
but gets a different prefix than <function>$prefix</function>.
|
|
The function also fails if the prefix is empty but the namespace URI
|
|
is not (because unprefixed attributes should by definition belong to
|
|
no namespace).
|
|
This function returns 1 on success, 0 otherwise.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>isId</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $attr->isId;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Determine whether an attribute is of type
|
|
ID. For documents with a DTD, this information
|
|
is only available if DTD loading/validation has been requested.
|
|
For HTML documents parsed with the HTML
|
|
parser ID detection is done
|
|
automatically. In XML documents, all "xml:id"
|
|
attributes are considered to be of type ID.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>serializeContent($docencoding)</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$string = $attr->serializeContent;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function is not part of DOM API. It returns attribute content
|
|
in the form in which it serializes into XML, that is
|
|
with all meta-characters properly quoted and with raw
|
|
entity references (except for entities expanded during parse time).
|
|
Setting the optional $docencoding flag to 1 enforces document
|
|
encoding for the output string (which is then passed to Perl as a
|
|
byte string). Otherwise the string is passed to Perl as (UTF-8 encoded)
|
|
characters.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-DocumentFragment">
|
|
<title>XML::LibXML's DOM L2 Document Fragment Implementation</title>
|
|
|
|
<titleabbrev>XML::LibXML::DocumentFragment</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>This class is a helper class as described in the DOM Level 2 Specification. It is implemented as a node without name. All adding, inserting or
|
|
replacing functions are aware of document fragments now.</para>
|
|
|
|
<para>As well <emphasis>all</emphasis> unbound nodes (all nodes that do not belong to any document sub-tree) are implicit members of document fragments.</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Namespace">
|
|
<title>XML::LibXML Namespace Implementation</title>
|
|
|
|
<titleabbrev>XML::LibXML::Namespace</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Namespace nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>Namespace nodes are returned by both $element->findnodes('namespace::foo') or by $node->getNamespaces().</para>
|
|
|
|
<para>The namespace node API is not part of any current DOM API, and so it is quite minimal. It should be noted that namespace nodes are
|
|
<emphasis>not</emphasis> a sub class of <xref linkend="XML-LibXML-Node"/>, however Namespace nodes act a lot like attribute nodes, and similarly named methods will
|
|
return what you would expect if you treated the namespace node as an attribute. Note that in order to fix several inconsistencies between the API and the documentation, the behavior of some functions have been changed in 1.64.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $ns = XML::LibXML::Namespace->new($nsURI);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Creates a new Namespace node. Note that this is not a 'node' as an attribute or an element node. Therefore you can't do
|
|
call all <xref linkend="XML-LibXML-Node"/> Functions. All functions available for this node are listed below.</para>
|
|
|
|
<para>Optionally you can pass the prefix to the namespace constructor. If this second parameter is omitted you will create a so called
|
|
default namespace. Note, the newly created namespace is not bound to any document or node, therefore you should not expect it to be
|
|
available in an existing document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>declaredURI</term>
|
|
<listitem>
|
|
<para>Returns the URI for this namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>declaredPrefix</term>
|
|
<listitem>
|
|
<para>Returns the prefix for this namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>nodeName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $ns->nodeName();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns "xmlns:prefix", where prefix is the prefix for this namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>name</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $ns->name();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Alias for nodeName()</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getLocalName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$localname = $ns->getLocalName();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the local name of this node as if it were an attribute, that is, the prefix associated with the namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getData</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $ns->getData();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the URI of the namespace, i.e. the value of this node as if it were an attribute.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getValue</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $ns->getValue();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Alias for getData()</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>value</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $ns->value();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Alias for getData()</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getNamespaceURI</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$known_uri = $ns->getNamespaceURI();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the string "http://www.w3.org/2000/xmlns/"</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getPrefix</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$known_prefix = $ns->getPrefix();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the string "xmlns"</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>unique_key</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$key = $ns->unique_key();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method returns a key guaranteed to be unique for this namespace, and to always be the same value for this namespace. Two namespace objects return the same key if and only if they have the same prefix and the same URI. The returned key value is useful as a key in hashes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-PI">
|
|
<title>XML::LibXML Processing Instructions</title>
|
|
|
|
<titleabbrev>XML::LibXML::PI</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
# Only methods specific to Processing Instruction nodes are listed here,
|
|
# see the XML::LibXML::Node manpage for other methods</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>Processing instructions are implemented with XML::LibXML with read and write access. The PI data is the PI without the PI target (as specified in
|
|
XML 1.0 [17]) as a string. This string can be accessed with getData as implemented in <xref linkend="XML-LibXML-Node"/>.</para>
|
|
|
|
<para>The write access is aware about the fact, that many processing instructions have attribute like data. Therefore setData() provides besides the DOM
|
|
spec conform Interface to pass a set of named parameter. So the code segment</para>
|
|
|
|
<programlisting>my $pi = $dom->createProcessingInstruction("abc");
|
|
$pi->setData(foo=>'bar', foobar=>'foobar');
|
|
$dom->appendChild( $pi );</programlisting>
|
|
|
|
<para>will result the following PI in the DOM:</para>
|
|
|
|
<programlisting><?abc foo="bar" foobar="foobar"?></programlisting>
|
|
|
|
<para>Which is how it is specified in the DOM specification. This three step interface creates temporary a node in perl space. This can be avoided while
|
|
using the insertProcessingInstruction() method. Instead of the three calls described above, the call</para>
|
|
|
|
<programlisting>$dom->insertProcessingInstruction("abc",'foo="bar" foobar="foobar"');</programlisting>
|
|
|
|
<para>will have the same result as above.</para>
|
|
|
|
<para><xref linkend="XML-LibXML-PI"/>'s implementation of setData() documented below differs a bit from the standard version as available in <xref linkend="XML-LibXML-Node"/>:</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>setData</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$pinode->setData( $data_string );
|
|
$pinode->setData( name=>string_value [...] );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This method allows one to change the content data of
|
|
a PI. Additionally to the interface specified for DOM
|
|
Level2, the method provides a named parameter
|
|
interface to set the data. This parameter list is
|
|
converted into a string before it is appended to the
|
|
PI.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Dtd">
|
|
<title>XML::LibXML DTD Handling</title>
|
|
|
|
<titleabbrev>XML::LibXML::Dtd</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>This class holds a DTD. You may parse a DTD from either a string, or from an external SYSTEM identifier.</para>
|
|
|
|
<para>No support is available as yet for parsing from a filehandle.</para>
|
|
|
|
<para>XML::LibXML::Dtd is a sub-class of <xref linkend="XML-LibXML-Node"/>, so all the methods available to nodes (particularly toString()) are available to Dtd objects.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dtd = XML::LibXML::Dtd->new($public_id, $system_id);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Parse a DTD from the system identifier, and return a DTD object that you can pass to $doc->is_valid() or $doc->validate().</para>
|
|
|
|
<programlisting> my $dtd = XML::LibXML::Dtd->new(
|
|
"SOME // Public / ID / 1.0",
|
|
"test.dtd"
|
|
);
|
|
my $doc = XML::LibXML->new->parse_file("test.xml");
|
|
$doc->validate($dtd);</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>parse_string</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$dtd = XML::LibXML::Dtd->parse_string($dtd_str);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The same as new() above, except you can parse a DTD from a string. Note that parsing from string may fail if the DTD contains external parametric-entity references with relative URLs.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getName</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$publicId = $dtd->getName();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the name of DTD; i.e., the name immediately following the DOCTYPE keyword.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>publicId</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$publicId = $dtd->publicId();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the public identifier of the external subset.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>systemId</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$systemId = $dtd->systemId();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the system identifier of the external subset.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-InputCallback">
|
|
<title>XML::LibXML Class for Input Callbacks</title>
|
|
|
|
<titleabbrev>XML::LibXML::InputCallback</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;</programlisting>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
|
|
<programlisting>my $input_callbacks = XML::LibXML::InputCallback->new();
|
|
$input_callbacks->register_callbacks([ $match_cb1, $open_cb1,
|
|
$read_cb1, $close_cb1 ] );
|
|
$input_callbacks->register_callbacks([ $match_cb2, $open_cb2,
|
|
$read_cb2, $close_cb2 ] );
|
|
$input_callbacks->register_callbacks( [ $match_cb3, $open_cb3,
|
|
$read_cb3, $close_cb3 ] );
|
|
|
|
$parser->input_callbacks( $input_callbacks );
|
|
$parser->parse_file( $some_xml_file );</programlisting>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>You may get unexpected results if you are trying to load external documents during libxml2 parsing if the location of the resource is not a
|
|
HTTP, FTP or relative location but a absolute path for example. To get around this limitation, you may add your own input handler to open, read and
|
|
close particular types of locations or URI classes. Using this input callback handlers, you can handle your own custom URI schemes for example.</para>
|
|
|
|
<para>The input callbacks are used whenever XML::LibXML has to get something other than externally parsed entities from somewhere. They are implemented
|
|
using a callback stack on the Perl layer in analogy to libxml2's native callback stack.</para>
|
|
|
|
<para>The XML::LibXML::InputCallback class transparently registers the input callbacks for the libxml2's parser processes.</para>
|
|
|
|
<sect2>
|
|
<title>How does XML::LibXML::InputCallback work?</title>
|
|
|
|
<para>The libxml2 library offers a callback implementation as global functions only. To work-around the troubles resulting in having only global
|
|
callbacks - for example, if the same global callback stack is manipulated by different applications running together in a single Apache
|
|
Web-server environment -, XML::LibXML::InputCallback comes with a object-oriented and a function-oriented part.</para>
|
|
|
|
<para>Using the function-oriented part the global callback stack of libxml2 can be manipulated. Those functions can be used as interface to the
|
|
callbacks on the C- and XS Layer. At the object-oriented part, operations for working with the "pseudo-localized" callback stack are
|
|
implemented. Currently, you can register and de-register callbacks on the Perl layer and initialize them on a per parser basis.</para>
|
|
|
|
<sect3>
|
|
<title>Callback Groups</title>
|
|
|
|
<para>The libxml2 input callbacks come in groups. One group contains a URI matcher (<emphasis>match</emphasis>), a data stream constructor (<emphasis>open</emphasis>),
|
|
a data stream reader (<emphasis>read</emphasis>), and a data stream destructor (<emphasis>close</emphasis>). The callbacks can be
|
|
manipulated on a per group basis only.</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>The Parser Process</title>
|
|
|
|
<para>The parser process works on an XML data stream, along which, links to other resources can be embedded. This can be links to external
|
|
DTDs or XIncludes for example. Those resources are identified by URIs. The callback implementation of libxml2 assumes that one callback
|
|
group can handle a certain amount of URIs and a certain URI scheme. Per default, callback handlers for <emphasis>file://*</emphasis>,
|
|
<emphasis>file:://*.gz</emphasis>, <emphasis>http://*</emphasis> and <emphasis>ftp://*</emphasis> are registered.</para>
|
|
|
|
<para>Callback groups in the callback stack are processed from top to bottom, meaning that callback groups registered later will be
|
|
processed before the earlier registered ones.</para>
|
|
|
|
<para>While parsing the data stream, the libxml2 parser checks if a registered callback group will handle a URI - if they will not, the URI
|
|
will be interpreted as <emphasis>file://URI</emphasis>. To handle a URI, the <emphasis>match</emphasis> callback will have to return
|
|
'1'. If that happens, the handling of the URI will be passed to that callback group. Next, the URI will be passed to the
|
|
<emphasis>open</emphasis> callback, which should return a <emphasis>reference</emphasis> to the data stream if it successfully opened the
|
|
file, '0' otherwise. If opening the stream was successful, the <emphasis>read</emphasis> callback will be called repeatedly until it
|
|
returns an empty string. After the read callback, the <emphasis>close</emphasis> callback will be called to close the stream.</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Organisation of callback groups in XML::LibXML::InputCallback</title>
|
|
|
|
<para>Callback groups are implemented as a stack (Array),
|
|
each entry holds a reference to an array of the
|
|
callbacks. For the libxml2 library, the
|
|
XML::LibXML::InputCallback callback implementation
|
|
appears as one single callback group. The Perl
|
|
implementation however allows one to manage different
|
|
callback stacks on a per libxml2-parser basis.</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Using XML::LibXML::InputCallback</title>
|
|
|
|
<para>After object instantiation using the parameter-less constructor, you can register callback groups.</para>
|
|
|
|
<programlisting>my $input_callbacks = XML::LibXML::InputCallback->new();
|
|
$input_callbacks->register_callbacks([ $match_cb1, $open_cb1,
|
|
$read_cb1, $close_cb1 ] );
|
|
$input_callbacks->register_callbacks([ $match_cb2, $open_cb2,
|
|
$read_cb2, $close_cb2 ] );
|
|
$input_callbacks->register_callbacks( [ $match_cb3, $open_cb3,
|
|
$read_cb3, $close_cb3 ] );
|
|
|
|
$parser->input_callbacks( $input_callbacks );
|
|
$parser->parse_file( $some_xml_file );</programlisting>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>What about the old callback system prior to XML::LibXML::InputCallback?</title>
|
|
|
|
<para>In XML::LibXML versions prior to 1.59 - i.e. without the XML::LibXML::InputCallback module - you could define your callbacks either using
|
|
globally or locally. You still can do that using XML::LibXML::InputCallback, and in addition to that you can define the callbacks on a per
|
|
parser basis!</para>
|
|
|
|
<para>If you use the old callback interface through global callbacks, XML::LibXML::InputCallback will treat them with a lower priority as the
|
|
ones registered using the new interface. The global callbacks will not override the callback groups registered using the new interface. Local
|
|
callbacks are attached to a specific parser instance, therefore they are treated with highest priority. If the <emphasis>match</emphasis>
|
|
callback of the callback group registered as local variable is identical to one of the callback groups registered using the new interface, that
|
|
callback group will be replaced.</para>
|
|
|
|
<para>Users of the old callback implementation whose <emphasis>open</emphasis> callback returned a plain string, will have to adapt their code
|
|
to return a reference to that string after upgrading to version >= 1.59. The new callback system can only deal with the
|
|
<emphasis>open</emphasis> callback returning a reference!</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Interface Description</title>
|
|
|
|
<sect2>
|
|
<title>Global Variables</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>$_CUR_CB</term>
|
|
|
|
<listitem>
|
|
<para>Stores the current callback and can be used as shortcut to access the callback stack.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>@_GLOBAL_CALLBACKS</term>
|
|
|
|
<listitem>
|
|
<para>Stores all callback groups for the current parser process.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>@_CB_STACK</term>
|
|
|
|
<listitem>
|
|
<para>Stores the currently used callback group. Used to prevent parser errors when dealing with nested XML data.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Global Callbacks</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>_callback_match</term>
|
|
|
|
<listitem>
|
|
<para>Implements the interface for the <emphasis>match</emphasis> callback at C-level and for the selection of the callback group
|
|
from the callbacks defined at the Perl-level.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>_callback_open</term>
|
|
|
|
<listitem>
|
|
<para>Forwards the <emphasis>open</emphasis> callback from libxml2 to the corresponding callback function at the Perl-level.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>_callback_read</term>
|
|
|
|
<listitem>
|
|
<para>Forwards the read request to the corresponding callback function at the Perl-level and returns the result to libxml2.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>_callback_close</term>
|
|
|
|
<listitem>
|
|
<para>Forwards the <emphasis>close</emphasis> callback from libxml2 to the corresponding callback function at the Perl-level..</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Class methods</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new()</term>
|
|
|
|
<listitem>
|
|
<para>A simple constructor.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>register_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])</term>
|
|
|
|
<listitem>
|
|
<para>The four callbacks <emphasis>have</emphasis> to be given as array reference in the above order <emphasis>match</emphasis>,
|
|
<emphasis>open</emphasis>, <emphasis>read</emphasis>, <emphasis>close</emphasis>!</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>unregister_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])</term>
|
|
|
|
<listitem>
|
|
<para>With no arguments given, <function>unregister_callbacks()</function> will delete the last registered callback group from the
|
|
stack. If four callbacks are passed as array reference, the callback group to unregister will be identified by the
|
|
<emphasis>match</emphasis> callback and deleted from the callback stack. Note that if several identical <emphasis>match</emphasis>
|
|
callbacks are defined in different callback groups, ALL of them will be deleted from the stack.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>init_callbacks( $parser )</term>
|
|
|
|
<listitem>
|
|
<para>Initializes the callback system for the provided parser before starting a parsing process.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>cleanup_callbacks()</term>
|
|
|
|
<listitem>
|
|
<para>Resets global variables and the libxml2 callback stack.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>lib_init_callbacks()</term>
|
|
|
|
<listitem>
|
|
<para>Used internally for callback registration at C-level.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>lib_cleanup_callbacks()</term>
|
|
|
|
<listitem>
|
|
<para>Used internally for callback resetting at the C-level.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para/>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Example callbacks</title>
|
|
|
|
<para>The following example is a purely fictitious example that uses a MyScheme::Handler object that responds to methods similar to an IO::Handle.</para>
|
|
|
|
<programlisting>
|
|
# Define the four callback functions
|
|
sub match_uri {
|
|
my $uri = shift;
|
|
return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs
|
|
}
|
|
|
|
sub open_uri {
|
|
my $uri = shift;
|
|
my $handler = MyScheme::Handler->new($uri);
|
|
return $handler;
|
|
}
|
|
|
|
# The returned $buffer will be parsed by the libxml2 parser
|
|
sub read_uri {
|
|
my $handler = shift;
|
|
my $length = shift;
|
|
my $buffer;
|
|
read($handler, $buffer, $length);
|
|
return $buffer; # $buffer will be an empty string '' if read() is done
|
|
}
|
|
|
|
# Close the handle associated with the resource.
|
|
sub close_uri {
|
|
my $handler = shift;
|
|
close($handler);
|
|
}
|
|
|
|
# Register them with a instance of XML::LibXML::InputCallback
|
|
my $input_callbacks = XML::LibXML::InputCallback->new();
|
|
$input_callbacks->register_callbacks([ \&match_uri, \&open_uri,
|
|
\&read_uri, \&close_uri ] );
|
|
|
|
# Register the callback group at a parser instance
|
|
$parser->input_callbacks( $input_callbacks );
|
|
|
|
# $some_xml_file will be parsed using our callbacks
|
|
$parser->parse_file( $some_xml_file );
|
|
|
|
|
|
</programlisting>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-RelaxNG">
|
|
<title>RelaxNG Schema Validation</title>
|
|
|
|
<titleabbrev>XML::LibXML::RelaxNG</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
$doc = XML::LibXML->new->parse_file($url);</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>The XML::LibXML::RelaxNG class is a tiny frontend to libxml2's RelaxNG implementation. Currently it supports only schema parsing and document
|
|
validation.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$rngschema = XML::LibXML::RelaxNG->new( location => $filename_or_url, no_network => 1 );
|
|
$rngschema = XML::LibXML::RelaxNG->new( string => $xmlschemastring, no_network => 1 );
|
|
$rngschema = XML::LibXML::RelaxNG->new( DOM => $doc, no_network => 1 );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The constructor of XML::LibXML::RelaxNG needs to be called with list of parameters. At least location, string or DOM parameter is required to
|
|
specify source of schema. Optional parameter no_network set to 1 cause that parser would not access network and optional parameter recover
|
|
set 1 cause that parser would not call die() on errors.</para>
|
|
|
|
<para>It is important, that each schema only have a single source.</para>
|
|
|
|
<para>The location parameter allows one to parse a schema
|
|
from the filesystem or a (non-HTTPS) URL.</para>
|
|
|
|
<para>The string parameter will parse the schema from the given XML string.</para>
|
|
|
|
<para>The DOM parameter allows one to parse the schema from a pre-parsed <xref linkend="XML-LibXML-Document"/>.</para>
|
|
|
|
<para>Note that the constructor will die() if the schema does not meed the constraints of the RelaxNG specification.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>validate</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>eval { $rngschema->validate( $doc ); };</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function allows one to validate a (parsed)
|
|
document against the given RelaxNG schema. The argument
|
|
of this function should be an XML::LibXML::Document
|
|
object. If this function succeeds, it will return 0,
|
|
otherwise it will die() and report the errors found.
|
|
Because of this validate() should be always
|
|
evaluated.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<chapter id="XML-LibXML-Schema">
|
|
<title>XML Schema Validation</title>
|
|
|
|
<titleabbrev>XML::LibXML::Schema</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
$doc = XML::LibXML->new->parse_file($url);</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>The XML::LibXML::Schema class is a tiny frontend to libxml2's XML Schema implementation. Currently it supports only schema parsing and
|
|
document validation. As of 2.6.32, libxml2 only supports decimal types up to 24 digits (the standard requires at least 18).
|
|
</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$xmlschema = XML::LibXML::Schema->new( location => $filename_or_url, no_network => 1 );
|
|
$xmlschema = XML::LibXML::Schema->new( string => $xmlschemastring, no_network => 1 );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>The constructor of XML::LibXML::Schema needs to be called with list of parameters. At least location or string parameter is required to
|
|
specify source of schema. Optional parameter no_network set to 1 cause that parser would not access network and optional parameter recover
|
|
set 1 cause that parser would not call die() on errors.</para>
|
|
|
|
<para>It is important, that each schema only have a single source.</para>
|
|
|
|
<para>The location parameter allows one to parse a schema
|
|
from the filesystem or a (non-HTTPS) URL.</para>
|
|
|
|
<para>The string parameter will parse the schema from the given XML string.</para>
|
|
|
|
<para>Note that the constructor will die() if the schema does not meed the constraints of the XML Schema specification.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>validate</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>eval { $xmlschema->validate( $doc ); };</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>This function allows one to validate a (parsed)
|
|
document against the given XML Schema. The argument of
|
|
this function should be a <xref
|
|
linkend="XML-LibXML-Document"/> object. If this
|
|
function succeeds, it will return 0, otherwise it will
|
|
die() and report the errors found. Because of this
|
|
validate() should be always evaluated.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-XPathContext">
|
|
<title>XPath Evaluation</title>
|
|
<titleabbrev>XML::LibXML::XPathContext</titleabbrev>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>
|
|
The XML::LibXML::XPathContext
|
|
class provides an almost complete
|
|
interface to libxml2's XPath implementation.
|
|
With XML::LibXML::XPathContext, it is possible to
|
|
evaluate XPath expressions in the context
|
|
of arbitrary node, context size, and context position,
|
|
with a user-defined namespace-prefix mapping,
|
|
custom XPath functions written in Perl, and
|
|
even a custom XPath variable resolver.
|
|
</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Examples</title>
|
|
<sect2>
|
|
<title>Namespaces</title>
|
|
<para>This example demonstrates <function>registerNs()</function> method.
|
|
It finds all paragraph nodes in an XHTML document.</para>
|
|
<programlisting>my $xc = XML::LibXML::XPathContext->new($xhtml_doc);
|
|
$xc->registerNs('xhtml', 'http://www.w3.org/1999/xhtml');
|
|
my @nodes = $xc->findnodes('//xhtml:p');</programlisting>
|
|
</sect2>
|
|
<sect2>
|
|
<title>Custom XPath functions</title>
|
|
<para>This example demonstrates <function>registerFunction()</function> method
|
|
by defining a function filtering nodes based on a Perl regular expression:</para>
|
|
<programlisting>sub grep_nodes {
|
|
my ($nodelist,$regexp) = @_;
|
|
my $result = XML::LibXML::NodeList->new;
|
|
for my $node ($nodelist->get_nodelist()) {
|
|
$result->push($node) if $node->textContent =~ $regexp;
|
|
}
|
|
return $result;
|
|
};
|
|
|
|
my $xc = XML::LibXML::XPathContext->new($node);
|
|
$xc->registerFunction('grep_nodes', \&grep_nodes);
|
|
my @nodes = $xc->findnodes('//section[grep_nodes(para,"\bsearch(ing|es)?\b")]');</programlisting>
|
|
</sect2>
|
|
<sect2>
|
|
<title>Variables</title>
|
|
<para>This example demonstrates <function>registerVarLookup()</function>
|
|
method. We use XPath variables to recycle results of previous evaluations:</para>
|
|
<programlisting>sub var_lookup {
|
|
my ($varname,$ns,$data)=@_;
|
|
return $data->{$varname};
|
|
}
|
|
|
|
my $areas = XML::LibXML->new->parse_file('areas.xml');
|
|
my $empl = XML::LibXML->new->parse_file('employees.xml');
|
|
|
|
my $xc = XML::LibXML::XPathContext->new($empl);
|
|
|
|
my %variables = (
|
|
A => $xc->find('/employees/employee[@salary>10000]'),
|
|
B => $areas->find('/areas/area[district='Brooklyn']/street'),
|
|
);
|
|
|
|
# get names of employees from $A working in an area listed in $B
|
|
$xc->registerVarLookupFunc(\&var_lookup, \%variables);
|
|
my @nodes = $xc->findnodes('$A[work_area/street = $B]/name');
|
|
</programlisting>
|
|
</sect2>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>my $xpc = XML::LibXML::XPathContext->new();</funcsynopsisinfo></funcsynopsis>
|
|
<para>Creates a new XML::LibXML::XPathContext object
|
|
without a context node.</para>
|
|
<funcsynopsis><funcsynopsisinfo>my $xpc = XML::LibXML::XPathContext->new($node);</funcsynopsisinfo></funcsynopsis>
|
|
<para>Creates a new XML::LibXML::XPathContext object with
|
|
the context node set to <literal>$node</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>registerNs</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->registerNs($prefix, $namespace_uri)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Registers namespace <literal>$prefix</literal> to
|
|
<literal>$namespace_uri</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>unregisterNs</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->unregisterNs($prefix)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Unregisters namespace <literal>$prefix</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>lookupNs</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$uri = $xpc->lookupNs($prefix)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Returns namespace URI registered with
|
|
<literal>$prefix</literal>. If <literal>$prefix</literal>
|
|
is not registered to any namespace URI returns
|
|
<literal>undef</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>registerVarLookupFunc</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->registerVarLookupFunc($callback, $data)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Registers variable lookup function
|
|
<literal>$callback</literal>. The registered function is
|
|
executed by the XPath engine each time an XPath variable
|
|
is evaluated. It takes three arguments:
|
|
<literal>$data</literal>, variable name, and variable
|
|
ns-URI and must return one value: a number or string or
|
|
any <literal>XML::LibXML::</literal> object that can be a result
|
|
of findnodes: Boolean, Literal, Number, Node
|
|
(e.g. Document, Element, etc.), or NodeList. For
|
|
convenience, simple (non-blessed) array references
|
|
containing only <xref linkend="XML-LibXML-Node"/> objects can be
|
|
used instead of an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getVarLookupData</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>$data = $xpc->getVarLookupData();</funcsynopsisinfo></funcsynopsis>
|
|
<para>
|
|
Returns the data that have been associated with a
|
|
variable lookup function during a previous call to
|
|
<literal>registerVarLookupFunc</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getVarLookupFunc</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>$callback = $xpc->getVarLookupFunc();</funcsynopsisinfo></funcsynopsis>
|
|
<para>
|
|
Returns the variable lookup function previously registered with
|
|
<literal>registerVarLookupFunc</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>unregisterVarLookupFunc</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>$xpc->unregisterVarLookupFunc($name);</funcsynopsisinfo></funcsynopsis>
|
|
<para>Unregisters variable lookup function and the associated lookup data.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>registerFunctionNS</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->registerFunctionNS($name, $uri, $callback)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Registers an extension function
|
|
<literal>$name</literal> in <literal>$uri</literal>
|
|
namespace. <literal>$callback</literal> must be a CODE
|
|
reference. The arguments of the callback function are
|
|
either simple scalars or <literal>XML::LibXML::*</literal> objects
|
|
depending on the XPath argument types. The function is
|
|
responsible for checking the argument number and
|
|
types. Result of the callback code must be a single
|
|
value of the following types: a simple scalar
|
|
(number, string) or an arbitrary <literal>XML::LibXML::*</literal>
|
|
object that can be a result of findnodes: Boolean,
|
|
Literal, Number, Node (e.g. Document, Element, etc.), or
|
|
NodeList. For convenience, simple (non-blessed) array
|
|
references containing only <xref linkend="XML-LibXML-Node"/>
|
|
objects can be used instead of a
|
|
<olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>unregisterFunctionNS</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->unregisterFunctionNS($name, $uri)</funcsynopsisinfo></funcsynopsis>
|
|
<para>
|
|
Unregisters extension function <literal>$name</literal>
|
|
in <literal>$uri</literal> namespace. Has the same
|
|
effect as passing <literal>undef</literal> as
|
|
<literal>$callback</literal> to
|
|
registerFunctionNS.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>registerFunction</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->registerFunction($name, $callback)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Same as <literal>registerFunctionNS</literal> but
|
|
without a namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>unregisterFunction</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->unregisterFunction($name)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Same as <literal>unregisterFunctionNS</literal> but
|
|
without a namespace.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>findnodes</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>@nodes = $xpc->findnodes($xpath)</funcsynopsisinfo></funcsynopsis>
|
|
<funcsynopsis><funcsynopsisinfo>@nodes = $xpc->findnodes($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
|
|
<funcsynopsis><funcsynopsisinfo>$nodelist = $xpc->findnodes($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
|
|
<para>Performs the xpath statement on the current node and
|
|
returns the result as an array. In scalar context,
|
|
returns an <olink targetdoc="XML::LibXML::NodeList">XML::LibXML::NodeList</olink> object. Optionally, a
|
|
node may be passed as a second argument to set the
|
|
context node for the query.</para>
|
|
<para>The xpath expression can be passed either as a string, or
|
|
as a <olink targetdoc="XML::LibXML::XPathExpression">XML::LibXML::XPathExpression</olink> object.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>find</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$object = $xpc->find($xpath )</funcsynopsisinfo></funcsynopsis>
|
|
<funcsynopsis><funcsynopsisinfo>$object = $xpc->find($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
|
|
<para>Performs the xpath expression using the current node
|
|
as the context of the expression, and returns the result
|
|
depending on what type of result the XPath expression
|
|
had. For example, the XPath <literal>1 * 3 +
|
|
52</literal> results in an <olink targetdoc="XML::LibXML::Number">XML::LibXML::Number</olink> object
|
|
being returned. Other expressions might return a
|
|
<olink targetdoc="XML::LibXML::Boolean">XML::LibXML::Boolean</olink> object, or a
|
|
<olink targetdoc="XML::LibXML::Literal">XML::LibXML::Literal</olink> object (a string). Each of those
|
|
objects uses Perl's overload feature to ``do the right
|
|
thing'' in different contexts. Optionally, a node may be
|
|
passed as a second argument to set the context node for
|
|
the query.</para>
|
|
<para>The xpath expression can be passed either as a string, or
|
|
as a <olink targetdoc="XML::LibXML::XPathExpression">XML::LibXML::XPathExpression</olink> object.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>findvalue</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$value = $xpc->findvalue($xpath )</funcsynopsisinfo></funcsynopsis>
|
|
<funcsynopsis><funcsynopsisinfo>$value = $xpc->findvalue($xpath, $context_node )</funcsynopsisinfo></funcsynopsis>
|
|
<para>Is exactly equivalent to:</para>
|
|
<programlisting>$xpc->find( $xpath, $context_node )->to_literal;</programlisting>
|
|
<para>That is, it returns the literal value of the
|
|
results. This enables you to ensure that you get a string
|
|
back from your search, allowing certain shortcuts. This
|
|
could be used as the equivalent of <xsl:value-of
|
|
select=``some_xpath''/>. Optionally, a node may be
|
|
passed in the second argument to set the context node for
|
|
the query.</para>
|
|
<para>The xpath expression can be passed either as a string, or
|
|
as a <olink targetdoc="XML::LibXML::XPathExpression">XML::LibXML::XPathExpression</olink> object.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>exists</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $xpc->exists( $xpath_expression, $context_node );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This method behaves like <emphasis>findnodes</emphasis>, except
|
|
that it only returns a boolean value (1 if the expression matches a node, 0 otherwise)
|
|
and may be faster than <emphasis>findnodes</emphasis>, because
|
|
the XPath evaluation may stop early on the first match (this is true for libxml2 >= 2.6.27).
|
|
</para><para>For XPath expressions that do not return node-set,
|
|
the method returns true if the returned value is a non-zero number or a non-empty string.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setContextNode</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->setContextNode($node)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Set the current context node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getContextNode</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>my $node = $xpc->getContextNode;</funcsynopsisinfo></funcsynopsis>
|
|
<para>Get the current context node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setContextPosition</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->setContextPosition($position)</funcsynopsisinfo></funcsynopsis>
|
|
<para>
|
|
Set the current context position. By default, this
|
|
value is -1 (and evaluating XPath function
|
|
<literal>position()</literal> in the initial context
|
|
raises an XPath error), but can be set to any value up
|
|
to context size. This usually only serves to cheat the
|
|
XPath engine to return given position when
|
|
<literal>position()</literal> XPath function is
|
|
called. Setting this value to -1 restores the default
|
|
behavior.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getContextPosition</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>my $position = $xpc->getContextPosition;</funcsynopsisinfo></funcsynopsis>
|
|
<para>Get the current context position.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setContextSize</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->setContextSize($size)</funcsynopsisinfo></funcsynopsis>
|
|
<para>
|
|
Set the current context size. By default, this value is -1 (and
|
|
evaluating XPath function <literal>last()</literal> in
|
|
the initial context raises an XPath error), but can be
|
|
set to any non-negative value. This usually only serves
|
|
to cheat the XPath engine to return the given value when
|
|
<literal>last()</literal> XPath function is called. If
|
|
context size is set to 0, position is automatically also
|
|
set to 0. If context size is positive, position is
|
|
automatically set to 1. Setting context size to -1
|
|
restores the default behavior.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getContextSize</term>
|
|
<listitem>
|
|
<funcsynopsis><funcsynopsisinfo>my $size = $xpc->getContextSize;</funcsynopsisinfo></funcsynopsis>
|
|
<para>Get the current context size.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setContextNode</term>
|
|
<listitem><funcsynopsis><funcsynopsisinfo>$xpc->setContextNode($node)</funcsynopsisinfo></funcsynopsis>
|
|
<para>Set the current context node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Bugs And Caveats</title>
|
|
<para>
|
|
XML::LibXML::XPathContext objects
|
|
<emphasis>are</emphasis> reentrant, meaning that you can call
|
|
methods of an XML::LibXML::XPathContext even from XPath
|
|
extension functions registered with the same object or from a
|
|
variable lookup function. On the other hand, you should rather
|
|
avoid registering new extension functions, namespaces and a
|
|
variable lookup function from within extension functions and a
|
|
variable lookup function, unless you want to experience
|
|
untested behavior.
|
|
</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Authors</title>
|
|
<para>Ilya Martynov and Petr Pajas, based on
|
|
XML::LibXML and XML::LibXSLT code by Matt Sergeant and
|
|
Christian Glahn.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Historical remark</title>
|
|
<para>Prior to XML::LibXML 1.61 this module was distributed separately
|
|
for maintenance reasons.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-Reader">
|
|
<title>XML::LibXML::Reader - interface to libxml2 pull parser</title>
|
|
<titleabbrev>XML::LibXML::Reader</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML::Reader;</programlisting>
|
|
<programlisting>my $reader = XML::LibXML::Reader->new(location => "file.xml")
|
|
or die "cannot read file.xml\n";
|
|
while ($reader->read) {
|
|
processNode($reader);
|
|
}</programlisting>
|
|
<programlisting>
|
|
sub processNode {
|
|
my $reader = shift;
|
|
printf "%d %d %s %d\n", ($reader->depth,
|
|
$reader->nodeType,
|
|
$reader->name,
|
|
$reader->isEmptyElement);
|
|
}
|
|
</programlisting>
|
|
<para>or</para>
|
|
<programlisting>
|
|
my $reader = XML::LibXML::Reader->new(location => "file.xml")
|
|
or die "cannot read file.xml\n";
|
|
$reader->preservePattern('//table/tr');
|
|
$reader->finish;
|
|
print $reader->document->toString(1);
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>DESCRIPTION</title>
|
|
<para>This is a perl interface to libxml2's pull-parser implementation
|
|
xmlTextReader
|
|
<emphasis>http://xmlsoft.org/html/libxml-xmlreader.html</emphasis>.
|
|
This feature requires at least libxml2-2.6.21.
|
|
Pull-parsers (such as StAX in Java, or XmlReader in C#) use an iterator
|
|
approach to parse XML documents. They are easier to program than
|
|
event-based parser (SAX) and much more lightweight than
|
|
tree-based parser (DOM), which load the complete tree into
|
|
memory.</para>
|
|
<para>The Reader acts as a cursor going forward on the document
|
|
stream and stopping at each node on the way. At every point,
|
|
the DOM-like methods of the Reader object allow one to examine the
|
|
current node (name, namespace, attributes, etc.)</para>
|
|
<para>The user's code keeps control of the progress and simply
|
|
calls the <literal>read()</literal> function repeatedly to
|
|
progress to the next node in the document order. Other
|
|
functions provide means for skipping complete sub-trees, or
|
|
nodes until a specific element, etc.</para>
|
|
<para>At every time, only a very limited portion of the
|
|
document is kept in the memory, which makes the API more
|
|
memory-efficient than using DOM. However, it is also possible
|
|
to mix Reader with DOM. At every point the user may copy the
|
|
current node (optionally expanded into a complete sub-tree)
|
|
from the processed document to another DOM tree, or to
|
|
instruct the Reader to collect sub-document in form of a DOM
|
|
tree consisting of selected nodes.</para>
|
|
<para>Reader API also supports namespaces, xml:base, entity
|
|
handling, and DTD validation. Schema and RelaxNG validation
|
|
support will probably be added in some later revision of the
|
|
Perl interface.</para>
|
|
<para>The naming of methods compared to libxml2 and C#
|
|
XmlTextReader has been changed slightly to match the
|
|
conventions of XML::LibXML. Some functions have been changed
|
|
or added with respect to the C interface.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>CONSTRUCTOR</title>
|
|
<para>Depending on the XML source, the Reader object can be created with either of:</para>
|
|
<programlisting>
|
|
my $reader = XML::LibXML::Reader->new( location => "file.xml", ... );
|
|
my $reader = XML::LibXML::Reader->new( string => $xml_string, ... );
|
|
my $reader = XML::LibXML::Reader->new( IO => $file_handle, ... );
|
|
my $reader = XML::LibXML::Reader->new( FD => fileno(STDIN), ... );
|
|
my $reader = XML::LibXML::Reader->new( DOM => $dom, ... );
|
|
</programlisting>
|
|
<para>where ... are (optional) reader options described below in <xref linkend="reader-parsing-options"/>
|
|
or various parser options described in <xref linkend="XML-LibXML-Parser"/>.
|
|
The constructor recognizes the following XML sources:</para>
|
|
<sect2>
|
|
<title>Source specification</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>location</term>
|
|
<listitem>
|
|
<para>Read XML from a local file or (non-HTTPS) URL.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>string</term>
|
|
<listitem>
|
|
<para>Read XML from a string.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>IO</term>
|
|
<listitem>
|
|
<para>Read XML a Perl IO filehandle.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>FD</term>
|
|
<listitem>
|
|
<para>Read XML from a file descriptor (bypasses Perl I/O
|
|
layer, only applicable to filehandles for regular
|
|
files or pipes). Possibly faster than IO.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>DOM</term>
|
|
<listitem>
|
|
<para>Use reader API to walk through a pre-parsed
|
|
<xref linkend="XML-LibXML-Document"/>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
<sect2 id="reader-parsing-options">
|
|
<title>Reader options</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>encoding => $encoding</term>
|
|
<listitem>
|
|
<para>override document encoding.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>RelaxNG => $rng_schema</term>
|
|
<listitem>
|
|
<para>can be used to pass either a <xref linkend="XML-LibXML-RelaxNG"/>
|
|
object or a filename or (non-HTTPS) URL of a RelaxNG schema to the
|
|
constructor. The schema is then used to validate the
|
|
document as it is processed.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>Schema => $xsd_schema</term>
|
|
<listitem>
|
|
<para>can be used to pass either a <xref linkend="XML-LibXML-Schema"/>
|
|
object or a filename or (non-HTTPS) URL of a W3C XSD schema to the
|
|
constructor. The schema is then used to validate the
|
|
document as it is processed.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>...</term>
|
|
<listitem>
|
|
<para>the reader further supports various
|
|
parser options described in
|
|
<xref linkend="XML-LibXML-Parser"/> (specifically those
|
|
labeled by /reader/).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect2>
|
|
</sect1>
|
|
<sect1>
|
|
<title>METHODS CONTROLLING PARSING PROGRESS</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>read ()</term>
|
|
<listitem>
|
|
<para>Moves the position to the next node in the stream,
|
|
exposing its properties.</para>
|
|
<para>Returns 1 if the node was read successfully, 0 if
|
|
there is no more nodes to read, or -1 in case of
|
|
error</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>readAttributeValue ()</term>
|
|
<listitem>
|
|
<para>Parses an attribute value into one or more Text and
|
|
EntityReference nodes.</para>
|
|
<para>Returns 1 in case of success, 0 if the reader was not positioned on an attribute node or all the attribute values have been read, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>readState ()</term>
|
|
<listitem>
|
|
<para>Gets the read state of the reader. Returns the state
|
|
value, or -1 in case of error. The module exports
|
|
constants for the Reader states, see STATES
|
|
below.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>depth ()</term>
|
|
<listitem>
|
|
<para>The depth of the node in the tree, starts at 0 for
|
|
the root node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>next ()</term>
|
|
<listitem>
|
|
<para>Skip to the node following the current one in the
|
|
document order while avoiding the sub-tree if any.
|
|
Returns 1 if the node was read successfully, 0 if there
|
|
is no more nodes to read, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nextElement (localname?,nsURI?)</term>
|
|
<listitem>
|
|
<para>Skip nodes following the current one in the document
|
|
order until a specific element is reached. The element's
|
|
name must be equal to a given localname if defined, and
|
|
its namespace must equal to a given nsURI if defined.
|
|
Either of the arguments can be undefined (or omitted, in
|
|
case of the latter or both).</para>
|
|
<para>Returns 1 if the element was found, 0 if there is no more nodes to read, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nextPatternMatch (compiled_pattern)</term>
|
|
<listitem>
|
|
<para>Skip nodes following the current one in the document
|
|
order until an element matching a given
|
|
compiled pattern is reached. See
|
|
<xref linkend="XML-LibXML-Pattern"/> for information on
|
|
compiled patterns. See also the <literal>matchesPattern</literal>
|
|
method.</para>
|
|
<para>Returns 1 if the element was found, 0 if there is no more nodes to read, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>skipSiblings ()</term>
|
|
<listitem>
|
|
<para>Skip all nodes on the same or lower level until the
|
|
first node on a higher level is reached. In particular,
|
|
if the current node occurs in an element, the reader
|
|
stops at the end tag of the parent element, otherwise it
|
|
stops at a node immediately following the parent
|
|
node.</para>
|
|
<para>Returns 1 if successful, 0 if end of the document is reached, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nextSibling ()</term>
|
|
<listitem>
|
|
<para>It skips to the node following the current one in
|
|
the document order while avoiding the sub-tree if
|
|
any.</para>
|
|
<para>Returns 1 if the node was read successfully, 0 if
|
|
there is no more nodes to read, or -1 in case of
|
|
error</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nextSiblingElement (name?,nsURI?)</term>
|
|
<listitem>
|
|
<para>Like nextElement but only processes sibling elements
|
|
of the current node (moving forward using
|
|
<literal>nextSibling ()</literal> rather than
|
|
<literal>read ()</literal>, internally).</para>
|
|
<para>Returns 1 if the element was found, 0 if there is no
|
|
more sibling nodes, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>finish ()</term>
|
|
<listitem>
|
|
<para>Skip all remaining nodes in the document, reaching end of the document.</para>
|
|
<para>Returns 1 if successful, 0 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>close ()</term>
|
|
<listitem>
|
|
<para>This method releases any resources allocated by the
|
|
current instance and closes any underlying input. It
|
|
returns 0 on failure and 1 on success. This method is
|
|
automatically called by the destructor when the reader
|
|
is forgotten, therefore you do not have to call it
|
|
directly.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>METHODS EXTRACTING INFORMATION</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>name ()</term>
|
|
<listitem>
|
|
<para>Returns the qualified name of the current node, equal to (Prefix:)LocalName.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nodeType ()</term>
|
|
<listitem>
|
|
<para>Returns the type of the current node. See NODE TYPES below.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>localName ()</term>
|
|
<listitem>
|
|
<para>Returns the local name of the node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>prefix ()</term>
|
|
<listitem>
|
|
<para>Returns the prefix of the namespace associated with the node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>namespaceURI ()</term>
|
|
<listitem>
|
|
<para>Returns the URI defining the namespace associated with the node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>isEmptyElement ()</term>
|
|
<listitem>
|
|
<para>Check if the current node is empty, this is a bit
|
|
bizarre in the sense that <a/> will be considered
|
|
empty while <a></a> will not.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>hasValue ()</term>
|
|
<listitem>
|
|
<para>Returns true if the node can have a text value.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>value ()</term>
|
|
<listitem>
|
|
<para>Provides the text value of the node if present or undef if not available.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>readInnerXml ()</term>
|
|
<listitem>
|
|
<para>Reads the contents of the current node, including
|
|
child nodes and markup. Returns a string containing the
|
|
XML of the node's content, or undef if the current node
|
|
is neither an element nor attribute, or has no child
|
|
nodes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>readOuterXml ()</term>
|
|
<listitem>
|
|
<para>Reads the contents of the current node, including
|
|
child nodes and markup.</para>
|
|
<para>Returns a string containing the XML of the node
|
|
including its content, or undef if the current node is
|
|
neither an element nor attribute.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nodePath()</term>
|
|
<listitem>
|
|
<para>Returns a canonical location path to the current element
|
|
from the root node to the current node. Namespaced
|
|
elements are matched by '*', because there is no way to declare
|
|
prefixes within XPath patterns. Unlike
|
|
<literal>XML::LibXML::Node::nodePath()</literal>, this function
|
|
does not provide sibling counts (i.e. instead of e.g. '/a/b[1]' and '/a/b[2]'
|
|
you get '/a/b' for both matches).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>matchesPattern(compiled_pattern)</term>
|
|
<listitem>
|
|
<para>Returns a true value if the current
|
|
node matches a compiled pattern.
|
|
See <xref linkend="XML-LibXML-Pattern"/> for information on
|
|
compiled patterns. See also the <literal>nextPatternMatch</literal>
|
|
method.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>METHODS EXTRACTING DOM NODES</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>document ()</term>
|
|
<listitem>
|
|
<para>Provides access to the document tree built by the
|
|
reader. This function can be used to collect the
|
|
preserved nodes (see <literal>preserveNode()</literal>
|
|
and preservePattern).</para>
|
|
<para>CAUTION: Never use this function to modify the tree
|
|
unless reading of the whole document is
|
|
completed!</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>copyCurrentNode (deep)</term>
|
|
<listitem>
|
|
<para>This function is similar a DOM function
|
|
<literal>copyNode()</literal>. It returns a copy of the
|
|
currently processed node as a corresponding DOM object.
|
|
Use deep = 1 to obtain the full sub-tree.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>preserveNode ()</term>
|
|
<listitem>
|
|
<para>This tells the XML Reader to preserve the current
|
|
node in the document tree. A document tree consisting of
|
|
the preserved nodes and their content can be obtained
|
|
using the method <literal>document()</literal> once
|
|
parsing is finished.</para>
|
|
<para>Returns the node or NULL in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>preservePattern (pattern,\%ns_map)</term>
|
|
<listitem>
|
|
<para>This tells the XML Reader to preserve all nodes
|
|
matched by the pattern (which is a streaming XPath
|
|
subset). A document tree consisting of the preserved
|
|
nodes and their content can be obtained using the method
|
|
<literal>document()</literal> once parsing is
|
|
finished.</para>
|
|
<para>An optional second argument can be used to provide a
|
|
HASH reference mapping prefixes used by the XPath to
|
|
namespace URIs.</para>
|
|
<para>The XPath subset available with this function is
|
|
described at</para>
|
|
<programlisting>http://www.w3.org/TR/xmlschema-1/#Selector</programlisting>
|
|
<para>and matches the production</para>
|
|
<programlisting>Path ::= ('.//')? ( Step '/' )* ( Step | '@' NameTest )</programlisting>
|
|
<para>Returns a positive number in case of success and -1
|
|
in case of error</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>METHODS PROCESSING ATTRIBUTES</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>attributeCount ()</term>
|
|
<listitem>
|
|
<para>Provides the number of attributes of the current
|
|
node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>hasAttributes ()</term>
|
|
<listitem>
|
|
<para>Whether the node has attributes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getAttribute (name)</term>
|
|
<listitem>
|
|
<para>Provides the value of the attribute with the
|
|
specified qualified name.</para>
|
|
<para>Returns a string containing the value of the
|
|
specified attribute, or undef in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getAttributeNs (localName, namespaceURI)</term>
|
|
<listitem>
|
|
<para>Provides the value of the specified
|
|
attribute.</para>
|
|
<para>Returns a string containing the value of the
|
|
specified attribute, or undef in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getAttributeNo (no)</term>
|
|
<listitem>
|
|
<para>Provides the value of the attribute with the
|
|
specified index relative to the containing
|
|
element.</para>
|
|
<para>Returns a string containing the value of the
|
|
specified attribute, or undef in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>isDefault ()</term>
|
|
<listitem>
|
|
<para>Returns true if the current attribute node was
|
|
generated from the default value defined in the
|
|
DTD.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>moveToAttribute (name)</term>
|
|
<listitem>
|
|
<para>Moves the position to the attribute with the
|
|
specified local name and namespace URI.</para>
|
|
<para>Returns 1 in case of success, -1 in case of error, 0
|
|
if not found</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>moveToAttributeNo (no)</term>
|
|
<listitem>
|
|
<para>Moves the position to the attribute with the
|
|
specified index relative to the containing
|
|
element.</para>
|
|
<para>Returns 1 in case of success, -1 in case of error, 0
|
|
if not found</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>moveToAttributeNs (localName,namespaceURI)</term>
|
|
<listitem>
|
|
<para>Moves the position to the attribute with the
|
|
specified local name and namespace URI.</para>
|
|
<para>Returns 1 in case of success, -1 in case of error, 0
|
|
if not found</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>moveToFirstAttribute ()</term>
|
|
<listitem>
|
|
<para>Moves the position to the first attribute associated
|
|
with the current node.</para>
|
|
<para>Returns 1 in case of success, -1 in case of error, 0
|
|
if not found</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>moveToNextAttribute ()</term>
|
|
<listitem>
|
|
<para>Moves the position to the next attribute associated
|
|
with the current node.</para>
|
|
<para>Returns 1 in case of success, -1 in case of error, 0
|
|
if not found</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>moveToElement ()</term>
|
|
<listitem>
|
|
<para>Moves the position to the node that contains the
|
|
current attribute node.</para>
|
|
<para>Returns 1 in case of success, -1 in case of error, 0
|
|
if not moved</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>isNamespaceDecl ()</term>
|
|
<listitem>
|
|
<para>Determine whether the current node is a namespace
|
|
declaration rather than a regular attribute.</para>
|
|
<para>Returns 1 if the current node is a namespace
|
|
declaration, 0 if it is a regular attribute or other
|
|
type of node, or -1 in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>OTHER METHODS</title>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>lookupNamespace (prefix)</term>
|
|
<listitem>
|
|
<para>Resolves a namespace prefix in the scope of the
|
|
current element.</para>
|
|
<para>Returns a string containing the namespace URI to
|
|
which the prefix maps or undef in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>encoding ()</term>
|
|
<listitem>
|
|
<para>Returns a string containing the encoding of the
|
|
document or undef in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>standalone ()</term>
|
|
<listitem>
|
|
<para>Determine the standalone status of the document
|
|
being read. Returns 1 if the document was declared to be
|
|
standalone, 0 if it was declared to be not standalone,
|
|
or -1 if the document did not specify its standalone
|
|
status or in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>xmlVersion ()</term>
|
|
<listitem>
|
|
<para>Determine the XML version of the document being
|
|
read. Returns a string containing the XML version of the
|
|
document or undef in case of error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>baseURI ()</term>
|
|
<listitem>
|
|
<para>Returns the base URI of a given node.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>isValid ()</term>
|
|
<listitem>
|
|
<para>Retrieve the validity status from the parser.</para>
|
|
<para>Returns 1 if valid, 0 if no, and -1 in case of
|
|
error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>xmlLang ()</term>
|
|
<listitem>
|
|
<para>The xml:lang scope within which the node
|
|
resides.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>lineNumber ()</term>
|
|
<listitem>
|
|
<para>Provide the line number of the current parsing
|
|
point.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>columnNumber ()</term>
|
|
<listitem>
|
|
<para>Provide the column number of the current parsing
|
|
point.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>byteConsumed ()</term>
|
|
<listitem>
|
|
<para>This function provides the current index of the
|
|
parser relative to the start of the current entity. This
|
|
function is computed in bytes from the beginning
|
|
starting at zero and finishing at the size in bytes of
|
|
the file if parsing a file. The function is of constant
|
|
cost if the input is UTF-8 but can be costly if run on
|
|
non-UTF-8 input.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setParserProp (prop => value, ...)</term>
|
|
<listitem>
|
|
<para>Change the parser processing behaviour by changing
|
|
some of its internal properties. The following
|
|
properties are available with this function:
|
|
``load_ext_dtd'', ``complete_attributes'',
|
|
``validation'', ``expand_entities''.</para>
|
|
<para>Since some of the properties can only be changed
|
|
before any read has been done, it is best to set the
|
|
parsing properties at the constructor.</para>
|
|
<para>Returns 0 if the call was successful, or -1 in case
|
|
of error</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>getParserProp (prop)</term>
|
|
<listitem>
|
|
<para>Get value of an parser internal property. The
|
|
following property names can be used: ``load_ext_dtd'',
|
|
``complete_attributes'', ``validation'',
|
|
``expand_entities''.</para>
|
|
<para>Returns the value, usually 0 or 1, or -1 in case of
|
|
error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>DESTRUCTION</title>
|
|
<para>XML::LibXML takes care of the reader object destruction
|
|
when the last reference to the reader object goes out of
|
|
scope. The document tree is preserved, though, if either of
|
|
$reader->document or $reader->preserveNode was used and
|
|
references to the document tree exist.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>NODE TYPES</title>
|
|
<para>The reader interface provides the following constants for
|
|
node types (the constant symbols are exported by default or if
|
|
tag <literal>:types</literal> is used).</para>
|
|
<programlisting>XML_READER_TYPE_NONE => 0
|
|
XML_READER_TYPE_ELEMENT => 1
|
|
XML_READER_TYPE_ATTRIBUTE => 2
|
|
XML_READER_TYPE_TEXT => 3
|
|
XML_READER_TYPE_CDATA => 4
|
|
XML_READER_TYPE_ENTITY_REFERENCE => 5
|
|
XML_READER_TYPE_ENTITY => 6
|
|
XML_READER_TYPE_PROCESSING_INSTRUCTION => 7
|
|
XML_READER_TYPE_COMMENT => 8
|
|
XML_READER_TYPE_DOCUMENT => 9
|
|
XML_READER_TYPE_DOCUMENT_TYPE => 10
|
|
XML_READER_TYPE_DOCUMENT_FRAGMENT => 11
|
|
XML_READER_TYPE_NOTATION => 12
|
|
XML_READER_TYPE_WHITESPACE => 13
|
|
XML_READER_TYPE_SIGNIFICANT_WHITESPACE => 14
|
|
XML_READER_TYPE_END_ELEMENT => 15
|
|
XML_READER_TYPE_END_ENTITY => 16
|
|
XML_READER_TYPE_XML_DECLARATION => 17
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>STATES</title>
|
|
<para>The following constants represent the values returned by
|
|
<literal>readState()</literal>. They are exported by default,
|
|
or if tag <literal>:states</literal> is used:</para>
|
|
<programlisting>XML_READER_NONE => -1
|
|
XML_READER_START => 0
|
|
XML_READER_ELEMENT => 1
|
|
XML_READER_END => 2
|
|
XML_READER_EMPTY => 3
|
|
XML_READER_BACKTRACK => 4
|
|
XML_READER_DONE => 5
|
|
XML_READER_ERROR => 6
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>SEE ALSO</title>
|
|
<para><xref linkend="XML-LibXML-Pattern"/> for information about compiled patterns.</para>
|
|
<para>http://xmlsoft.org/html/libxml-xmlreader.html</para>
|
|
<para>http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>ORIGINAL IMPLEMENTATION</title>
|
|
<para>Heiko Klein, <H.Klein@gmx.net<gt> and Petr Pajas</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-XPathExpression">
|
|
<title>XML::LibXML::XPathExpression - interface to libxml2 pre-compiled XPath expressions</title>
|
|
<titleabbrev>XML::LibXML::XPathExpression</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
my $compiled_xpath = XML::LibXML::XPathExpression->new('//foo[@bar="baz"][position()<4]');
|
|
|
|
# interface from XML::LibXML::Node
|
|
|
|
my $result = $node->find($compiled_xpath);
|
|
my @nodes = $node->findnodes($compiled_xpath);
|
|
my $value = $node->findvalue($compiled_xpath);
|
|
|
|
# interface from XML::LibXML::XPathContext
|
|
|
|
my $result = $xpc->find($compiled_xpath,$node);
|
|
my @nodes = $xpc->findnodes($compiled_xpath,$node);
|
|
my $value = $xpc->findvalue($compiled_xpath,$node);
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>This is a perl interface to libxml2's pre-compiled XPath expressions.
|
|
Pre-compiling an XPath expression can give in some performance
|
|
benefit if the same XPath query is evaluated many times.
|
|
<function>XML::LibXML::XPathExpression</function> objects
|
|
can be passed to all <function>find...</function>
|
|
functions <function>XML::LibXML</function>
|
|
that expect an XPath expression.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new()</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$compiled = XML::LibXML::XPathExpression->new( xpath_string );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>The constructor takes an XPath 1.0 expression as a string
|
|
and returns an object representing the pre-compiled
|
|
expressions (the actual data structure is internal to libxml2).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-Pattern">
|
|
<title>XML::LibXML::Pattern - interface to libxml2 XPath patterns</title>
|
|
<titleabbrev>XML::LibXML::Pattern</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
my $pattern = XML::LibXML::Pattern->new('/x:html/x:body//x:div', { 'x' => 'http://www.w3.org/1999/xhtml' });
|
|
# test a match on an XML::LibXML::Node $node
|
|
|
|
if ($pattern->matchesNode($node)) { ... }
|
|
|
|
# or on an XML::LibXML::Reader
|
|
|
|
if ($reader->matchesPattern($pattern)) { ... }
|
|
|
|
# or skip reading all nodes that do not match
|
|
|
|
print $reader->nodePath while $reader->nextPatternMatch($pattern);
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>This is a perl interface to libxml2's pattern matching support
|
|
<emphasis>http://xmlsoft.org/html/libxml-pattern.html</emphasis>.
|
|
This feature requires recent versions of libxml2.</para>
|
|
<para>Patterns are a small subset of XPath language, which is limited
|
|
to (disjunctions of) location paths involving the child and descendant axes in abbreviated form
|
|
as described by the extended BNF given below:
|
|
</para>
|
|
<programlisting>Selector ::= Path ( '|' Path )*
|
|
Path ::= ('.//' | '//' | '/' )? Step ( '/' Step )*
|
|
Step ::= '.' | NameTest
|
|
NameTest ::= QName | '*' | NCName ':' '*'</programlisting>
|
|
<para>For readability, whitespace may be used in selector XPath expressions even though not explicitly allowed by the grammar: whitespace may be freely added within patterns before or after any token, where</para>
|
|
<programlisting>token ::= '.' | '/' | '//' | '|' | NameTest</programlisting>
|
|
<para>Note that no predicates or attribute tests are allowed.</para>
|
|
<para>Patterns are particularly useful for stream parsing provided via the <literal>XML::LibXML::Reader</literal> interface.</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new()</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$pattern = XML::LibXML::Pattern->new( pattern, { prefix => namespace_URI, ... } );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>The constructor of a pattern takes a pattern expression (as described
|
|
by the BNF grammar above) and an optional HASH reference mapping
|
|
prefixes to namespace URIs. The method returns a compiled pattern object.
|
|
</para>
|
|
<para>
|
|
Note that if the document
|
|
has a default namespace, it must still be given an prefix in order
|
|
to be matched (as demanded by the XPath 1.0 specification). For example,
|
|
to match an element <literal><a xmlns="http://foo.bar"</a></literal>, one
|
|
should use a pattern like this:
|
|
</para>
|
|
<programlisting>$pattern = XML::LibXML::Pattern->new( 'foo:a', { foo => 'http://foo.bar' });</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>matchesNode($node)</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $pattern->matchesNode($node);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Given an XML::LibXML::Node object, returns a true value if
|
|
the node is matched by the compiled pattern expression.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
<sect1>
|
|
<title>SEE ALSO</title>
|
|
<para><xref linkend="XML-LibXML-Reader"/> for other methods involving compiled patterns.</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-RegExp">
|
|
<title>XML::LibXML::RegExp - interface to libxml2 regular expressions</title>
|
|
<titleabbrev>XML::LibXML::RegExp</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
my $compiled_re = XML::LibXML::RegExp->new('[0-9]{5}(-[0-9]{4})?');
|
|
if ($compiled_re->isDeterministic()) { ... }
|
|
if ($compiled_re->matches($string)) { ... }
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>This is a perl interface to libxml2's implementation of regular expressions, which are used e.g. for validation of XML Schema simple types (pattern facet).</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>new()</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$compiled_re = XML::LibXML::RegExp->new( $regexp_str );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>The constructor takes a string containing a regular expression
|
|
and returns a compiled regexp object.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>matches($string)</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $compiled_re->matches($string);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Given a string value, returns a true value if
|
|
the value is matched by the compiled regular expression.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>isDeterministic()</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$bool = $compiled_re->isDeterministic();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns a true value if the regular expression is deterministic; returns false otherwise. (See the definition of determinism in the <ulink url="http://www.w3.org/TR/REC-xml/#determinism">XML spec</ulink>)</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-NamedNodeMap">
|
|
<title>A map for named nodes</title>
|
|
|
|
<titleabbrev>XML::LibXML::NamedNodeMap</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML;
|
|
my $map = XML::LibXML::NamedNodeMap->new(@nodes);
|
|
|
|
my $nodes_list = $map->nodes();
|
|
|
|
my $node_with_index_2 = $map->item(2);
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
|
|
<para>XML::LibXML::NamedNodeMap maps nodes' names to nodes.</para>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Methods</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>length</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $length = $map->length;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the number of nodes in the map.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>nodes</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $nodes_ref = $node->nodes()</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns a reference to the list of nodes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>item</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $node_2 = $map->item(2);</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the node with the index of the argument
|
|
(starting from 0)</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getNamedItem</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>my $node = $map->getNamedItem('phone_number');</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Returns the node with the name.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>setNamedItem</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$map->setNamedItem($new_node)</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Sets the node with the same name as
|
|
<literal>$new_node</literal> to
|
|
<literal>$new_node</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>removeNamedItem</term>
|
|
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$map->removeNamedItem($name)</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
|
|
<para>Remove the item with the name
|
|
<literal>$name</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>getNamedItemNS</term>
|
|
|
|
<listitem><para>
|
|
<emphasis>Not implemented yet.</emphasis>.
|
|
</para></listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>setNamedItemNS</term>
|
|
|
|
<listitem><para>
|
|
<emphasis>Not implemented yet.</emphasis>.
|
|
</para></listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>removeNamedItemNS</term>
|
|
|
|
<listitem><para>
|
|
<emphasis>Not implemented yet.</emphasis>.
|
|
</para></listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-Error">
|
|
<title>Structured Errors</title>
|
|
<titleabbrev>XML::LibXML::Error</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>
|
|
eval { ... };
|
|
if (ref($@)) {
|
|
# handle a structured error (XML::LibXML::Error object)
|
|
} elsif ($@) {
|
|
# error, but not an XML::LibXML::Error object
|
|
} else {
|
|
# no error
|
|
}
|
|
</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>The
|
|
XML::LibXML::Error class is a tiny frontend to
|
|
<emphasis>libxml2</emphasis>'s structured error support. If
|
|
XML::LibXML is compiled with structured error support, all errors
|
|
reported by libxml2 are transformed to XML::LibXML::Error
|
|
objects. These objects automatically serialize to the
|
|
corresponding error messages when printed or used in a string
|
|
operation, but as objects, can also be used to get a detailed and
|
|
structured information about the error that occurred.
|
|
</para>
|
|
<para>Unlike most other XML::LibXML objects, XML::LibXML::Error
|
|
doesn't wrap an underlying <emphasis>libxml2</emphasis>
|
|
structure directly, but rather transforms it to a blessed Perl
|
|
hash reference containing the individual fields of the
|
|
structured error information as hash key-value pairs. Individual
|
|
items (fields) of a structured error can either be
|
|
obtained directly as $@->{field}, or using autoloaded
|
|
methods such as $@->field() (where field is the field
|
|
name). XML::LibXML::Error objects have the following fields:
|
|
domain, code, level, file, line, nodename, message, str1, str2,
|
|
str3, num1, num2, and _prev (some of them may be undefined).
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>$XML::LibXML::Error::WARNINGS</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$XML::LibXML::Error::WARNINGS=1;</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Traditionally, XML::LibXML was suppressing parser
|
|
warnings by setting libxml2's global variable
|
|
xmlGetWarningsDefaultValue to 0. Since
|
|
1.70 we do not change libxml2's global
|
|
variables anymore; for backward compatibility,
|
|
XML::LibXML suppresses warnings.
|
|
This variable can be set to 1
|
|
to enable reporting of these warnings via
|
|
Perl <literal>warn</literal>
|
|
and to 2 to report hem via <literal>die</literal>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>as_string</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$message = $@->as_string();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This function serializes an XML::LibXML::Error
|
|
object to a string containing the full error message
|
|
close to the message produced by <emphasis>libxml2</emphasis> default error
|
|
handlers and tools like xmllint. This method is also used
|
|
to overload "" operator on XML::LibXML::Error, so it is
|
|
automatically called whenever XML::LibXML::Error object
|
|
is treated as a string (e.g. in print $@).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>dump</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>print $@->dump();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This function serializes an XML::LibXML::Error to a
|
|
string displaying all fields of the error structure
|
|
individually on separate lines of the form 'name' => 'value'.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>domain</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_domain = $@->domain();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns string containing information about what part
|
|
of the library raised the error. Can be one of:
|
|
"parser", "tree", "namespace", "validity", "HTML parser",
|
|
"memory", "output", "I/O", "ftp", "http",
|
|
"XInclude", "XPath", "xpointer", "regexp", "Schemas
|
|
datatype",
|
|
"Schemas parser", "Schemas validity",
|
|
"Relax-NG parser", "Relax-NG validity", "Catalog",
|
|
"C14N", "XSLT", "validity".
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>code</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_code = $@->code();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns the actual libxml2 error code.
|
|
The XML::LibXML::ErrNo module defines
|
|
constants for individual error codes. Currently
|
|
libxml2 uses over 480 different error codes.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>message</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_message = $@->message();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns a human-readable informative error
|
|
message.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>level</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_level = $@->level();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns an integer value describing how consequent is
|
|
the error. XML::LibXML::Error defines the following
|
|
constants:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>XML_ERR_NONE = 0</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>XML_ERR_WARNING = 1 : A simple warning.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>XML_ERR_ERROR = 2 : A recoverable error.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>XML_ERR_FATAL = 3 : A fatal error.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>file</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$filename = $@->file();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Returns the filename of the file being processed while
|
|
the error occurred.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>line</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$line = $@->line();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>The line number, if available.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>nodename</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$nodename = $@->nodename();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Name of the node where error occurred, if available.
|
|
When this field is non-empty, libxml2 actually returned a
|
|
physical pointer to the specified node. Due to memory
|
|
management issues, it is very difficult to implement a
|
|
way to expose the pointer to the Perl level as a
|
|
XML::LibXML::Node. For this reason, XML::LibXML::Error
|
|
currently only exposes the name the node.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>str1</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_str1 = $@->str1();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Error specific. Extra string information.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>str2</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_str2 = $@->str2();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Error specific. Extra string information.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>str3</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_str3 = $@->str3();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Error specific. Extra string information.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>num1</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_num1 = $@->num1();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>Error specific. Extra numeric information.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>num2</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$error_num2 = $@->num2();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>In recent libxml2 versions, this
|
|
value contains a column number of the error or 0 if N/A.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>context</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$string = $@->context();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>For parsing errors, this field contains
|
|
about 80 characters of the XML near the place
|
|
where the error occurred. The field
|
|
<function>$@->column()</function>
|
|
contains the corresponding offset.
|
|
Where N/A, the field is undefined.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>column</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$offset = $@->column();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>See <function>$@->column()</function> above.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>_prev</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$previous_error = $@->_prev();</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>This field can possibly hold a reference to another
|
|
XML::LibXML::Error object representing an error which
|
|
occurred just before this error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-ErrNo">
|
|
<title>Structured Errors</title>
|
|
<titleabbrev>XML::LibXML::ErrNo</titleabbrev>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>This module is based on xmlerror.h libxml2 C header file.
|
|
It defines symbolic constants for all libxml2 error codes.
|
|
Currently libxml2 uses over 480 different error codes.
|
|
See also XML::LibXML::Error.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
<chapter id="XML-LibXML-Common">
|
|
<title>Constants and Character Encoding Routines</title>
|
|
<titleabbrev>XML::LibXML::Common</titleabbrev>
|
|
<sect1>
|
|
<title>Synopsis</title>
|
|
<programlisting>use XML::LibXML::Common;</programlisting>
|
|
</sect1>
|
|
<sect1>
|
|
<title>Description</title>
|
|
<para>
|
|
XML::LibXML::Common defines constants for all node types
|
|
and provides interface to libxml2 charset conversion
|
|
functions.
|
|
</para>
|
|
<para>Since XML::LibXML use their own node type definitions,
|
|
one may want to use XML::LibXML::Common in its compatibility
|
|
mode:
|
|
</para>
|
|
<sect2>
|
|
<title>Exporter TAGS</title>
|
|
<programlisting>use XML::LibXML::Common qw(:libxml);</programlisting>
|
|
<para><literal>:libxml</literal> tag will use the XML::LibXML Compatibility mode, which defines the
|
|
old 'XML_' node-type definitions.</para>
|
|
<programlisting>use XML::LibXML::Common qw(:gdome);</programlisting>
|
|
<para><literal>:gdome</literal> tag will use the XML::GDOME Compatibility mode, which defines the
|
|
old 'GDOME_' node-type definitions.</para>
|
|
<programlisting>use XML::LibXML::Common qw(:w3c);</programlisting>
|
|
<para>This uses the nodetype definition names as specified for DOM.</para>
|
|
<programlisting>use XML::LibXML::Common qw(:encoding);</programlisting>
|
|
<para>
|
|
This tag can be used to export only the charset encoding functions of XML::LibXML::Common.
|
|
</para>
|
|
</sect2>
|
|
<sect2>
|
|
<title>Exports</title>
|
|
<para>
|
|
By default the W3 definitions as defined in the DOM specifications and
|
|
the encoding functions are exported by XML::LibXML::Common.
|
|
</para>
|
|
</sect2>
|
|
<sect2>
|
|
<title>Encoding functions</title>
|
|
<para>
|
|
To encode or decode a string to or from UTF-8, XML::LibXML::Common exports
|
|
two functions, which provide an interface to the encoding support in <literal>libxml2</literal>.
|
|
Which encodings are supported by these functions depends
|
|
on how <literal>libxml2</literal> was compiled. UTF-16 is
|
|
always supported and on most installations, ISO encodings are
|
|
supported as well.
|
|
</para>
|
|
<para>
|
|
This interface was useful for older versions of Perl.
|
|
Since Perl >= 5.8 provides similar functions via the <literal>Encode</literal> module,
|
|
it is probably a good idea to use those instead.
|
|
</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>encodeToUTF8</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$encodedstring = encodeToUTF8( $name_of_encoding, $sting_to_encode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>The function will convert a byte string from the specified encoding to an UTF-8 encoded character string.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term>decodeToUTF8</term>
|
|
<listitem>
|
|
<funcsynopsis>
|
|
<funcsynopsisinfo>$decodedstring = decodeFromUTF8($name_of_encoding, $string_to_decode );</funcsynopsisinfo>
|
|
</funcsynopsis>
|
|
<para>
|
|
This function converts an UTF-8 encoded character string to a specified
|
|
encoding. Note that the conversion can raise an error if the
|
|
given string contains characters that cannot be represented in the target encoding.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>Both these functions report their errors on the standard
|
|
error. If an error occurs the function will croak(). To catch
|
|
the error information it is required to call the encoding
|
|
function from within an eval block in order to prevent the
|
|
entire script from being stopped on encoding error.</para>
|
|
</sect2>
|
|
<sect2>
|
|
<title>A note on history</title>
|
|
<para>
|
|
Before XML::LibXML 1.70, this class was available as a
|
|
separate CPAN distribution, intended to provide functionality
|
|
shared between XML::LibXML, XML::GDOME, and possibly other
|
|
modules. Since there seems to be no progress in this
|
|
direction, we decided to merge XML::LibXML::Common 0.13 and
|
|
XML::LibXML 1.70 to one CPAN distribution.
|
|
</para>
|
|
<para>The merge also naturally eliminates a practical and
|
|
urgent problem experienced by many XML::LibXML users on certain
|
|
platforms, namely mysterious misbehavior of XML::LibXML
|
|
occurring if the installed (often pre-packaged) version of
|
|
XML::LibXML::Common was compiled against an older version of
|
|
libxml2 than XML::LibXML.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
</chapter>
|
|
</book>
|
|
|
|
|