301 lines
9.6 KiB
Plaintext
301 lines
9.6 KiB
Plaintext
=head1 NAME
|
|
|
|
XML::LibXML::InputCallback - XML::LibXML Class for Input Callbacks
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
|
|
|
|
use XML::LibXML;
|
|
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
You may get unexpected results if you are trying to load external documents
|
|
during libxml2 parsing if the location of the resource is not a HTTP, FTP or
|
|
relative location but a absolute path for example. To get around this
|
|
limitation, you may add your own input handler to open, read and close
|
|
particular types of locations or URI classes. Using this input callback
|
|
handlers, you can handle your own custom URI schemes for example.
|
|
|
|
The input callbacks are used whenever XML::LibXML has to get something other
|
|
than externally parsed entities from somewhere. They are implemented using a
|
|
callback stack on the Perl layer in analogy to libxml2's native callback stack.
|
|
|
|
The XML::LibXML::InputCallback class transparently registers the input
|
|
callbacks for the libxml2's parser processes.
|
|
|
|
|
|
=head2 How does XML::LibXML::InputCallback work?
|
|
|
|
The libxml2 library offers a callback implementation as global functions only.
|
|
To work-around the troubles resulting in having only global callbacks - for
|
|
example, if the same global callback stack is manipulated by different
|
|
applications running together in a single Apache Web-server environment -,
|
|
XML::LibXML::InputCallback comes with a object-oriented and a function-oriented
|
|
part.
|
|
|
|
Using the function-oriented part the global callback stack of libxml2 can be
|
|
manipulated. Those functions can be used as interface to the callbacks on the
|
|
C- and XS Layer. At the object-oriented part, operations for working with the
|
|
"pseudo-localized" callback stack are implemented. Currently, you can register
|
|
and de-register callbacks on the Perl layer and initialize them on a per parser
|
|
basis.
|
|
|
|
|
|
=head3 Callback Groups
|
|
|
|
The libxml2 input callbacks come in groups. One group contains a URI matcher (I<<<<<< match >>>>>>), a data stream constructor (I<<<<<< open >>>>>>), a data stream reader (I<<<<<< read >>>>>>), and a data stream destructor (I<<<<<< close >>>>>>). The callbacks can be manipulated on a per group basis only.
|
|
|
|
|
|
=head3 The Parser Process
|
|
|
|
The parser process works on an XML data stream, along which, links to other
|
|
resources can be embedded. This can be links to external DTDs or XIncludes for
|
|
example. Those resources are identified by URIs. The callback implementation of
|
|
libxml2 assumes that one callback group can handle a certain amount of URIs and
|
|
a certain URI scheme. Per default, callback handlers for I<<<<<< file://* >>>>>>, I<<<<<< file:://*.gz >>>>>>, I<<<<<< http://* >>>>>> and I<<<<<< ftp://* >>>>>> are registered.
|
|
|
|
Callback groups in the callback stack are processed from top to bottom, meaning
|
|
that callback groups registered later will be processed before the earlier
|
|
registered ones.
|
|
|
|
While parsing the data stream, the libxml2 parser checks if a registered
|
|
callback group will handle a URI - if they will not, the URI will be
|
|
interpreted as I<<<<<< file://URI >>>>>>. To handle a URI, the I<<<<<< match >>>>>> callback will have to return '1'. If that happens, the handling of the URI will
|
|
be passed to that callback group. Next, the URI will be passed to the I<<<<<< open >>>>>> callback, which should return a I<<<<<< reference >>>>>> to the data stream if it successfully opened the file, '0' otherwise. If
|
|
opening the stream was successful, the I<<<<<< read >>>>>> callback will be called repeatedly until it returns an empty string. After the
|
|
read callback, the I<<<<<< close >>>>>> callback will be called to close the stream.
|
|
|
|
|
|
=head3 Organisation of callback groups in XML::LibXML::InputCallback
|
|
|
|
Callback groups are implemented as a stack (Array), each entry holds a
|
|
reference to an array of the callbacks. For the libxml2 library, the
|
|
XML::LibXML::InputCallback callback implementation appears as one single
|
|
callback group. The Perl implementation however allows one to manage different
|
|
callback stacks on a per libxml2-parser basis.
|
|
|
|
|
|
=head2 Using XML::LibXML::InputCallback
|
|
|
|
After object instantiation using the parameter-less constructor, you can
|
|
register callback groups.
|
|
|
|
|
|
|
|
my $input_callbacks = XML::LibXML::InputCallback->new();
|
|
$input_callbacks->register_callbacks([ $match_cb1, $open_cb1,
|
|
$read_cb1, $close_cb1 ] );
|
|
$input_callbacks->register_callbacks([ $match_cb2, $open_cb2,
|
|
$read_cb2, $close_cb2 ] );
|
|
$input_callbacks->register_callbacks( [ $match_cb3, $open_cb3,
|
|
$read_cb3, $close_cb3 ] );
|
|
|
|
$parser->input_callbacks( $input_callbacks );
|
|
$parser->parse_file( $some_xml_file );
|
|
|
|
|
|
=head2 What about the old callback system prior to XML::LibXML::InputCallback?
|
|
|
|
In XML::LibXML versions prior to 1.59 - i.e. without the
|
|
XML::LibXML::InputCallback module - you could define your callbacks either
|
|
using globally or locally. You still can do that using
|
|
XML::LibXML::InputCallback, and in addition to that you can define the
|
|
callbacks on a per parser basis!
|
|
|
|
If you use the old callback interface through global callbacks,
|
|
XML::LibXML::InputCallback will treat them with a lower priority as the ones
|
|
registered using the new interface. The global callbacks will not override the
|
|
callback groups registered using the new interface. Local callbacks are
|
|
attached to a specific parser instance, therefore they are treated with highest
|
|
priority. If the I<<<<<< match >>>>>> callback of the callback group registered as local variable is identical to one
|
|
of the callback groups registered using the new interface, that callback group
|
|
will be replaced.
|
|
|
|
Users of the old callback implementation whose I<<<<<< open >>>>>> callback returned a plain string, will have to adapt their code to return a
|
|
reference to that string after upgrading to version >= 1.59. The new callback
|
|
system can only deal with the I<<<<<< open >>>>>> callback returning a reference!
|
|
|
|
|
|
=head1 INTERFACE DESCRIPTION
|
|
|
|
|
|
=head2 Global Variables
|
|
|
|
=over 4
|
|
|
|
=item $_CUR_CB
|
|
|
|
Stores the current callback and can be used as shortcut to access the callback
|
|
stack.
|
|
|
|
|
|
=item @_GLOBAL_CALLBACKS
|
|
|
|
Stores all callback groups for the current parser process.
|
|
|
|
|
|
=item @_CB_STACK
|
|
|
|
Stores the currently used callback group. Used to prevent parser errors when
|
|
dealing with nested XML data.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
=head2 Global Callbacks
|
|
|
|
=over 4
|
|
|
|
=item _callback_match
|
|
|
|
Implements the interface for the I<<<<<< match >>>>>> callback at C-level and for the selection of the callback group from the
|
|
callbacks defined at the Perl-level.
|
|
|
|
|
|
=item _callback_open
|
|
|
|
Forwards the I<<<<<< open >>>>>> callback from libxml2 to the corresponding callback function at the Perl-level.
|
|
|
|
|
|
=item _callback_read
|
|
|
|
Forwards the read request to the corresponding callback function at the
|
|
Perl-level and returns the result to libxml2.
|
|
|
|
|
|
=item _callback_close
|
|
|
|
Forwards the I<<<<<< close >>>>>> callback from libxml2 to the corresponding callback function at the
|
|
Perl-level..
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
=head2 Class methods
|
|
|
|
=over 4
|
|
|
|
=item new()
|
|
|
|
A simple constructor.
|
|
|
|
|
|
=item register_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])
|
|
|
|
The four callbacks I<<<<<< have >>>>>> to be given as array reference in the above order I<<<<<< match >>>>>>, I<<<<<< open >>>>>>, I<<<<<< read >>>>>>, I<<<<<< close >>>>>>!
|
|
|
|
|
|
=item unregister_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])
|
|
|
|
With no arguments given, C<<<<<< unregister_callbacks() >>>>>> will delete the last registered callback group from the stack. If four
|
|
callbacks are passed as array reference, the callback group to unregister will
|
|
be identified by the I<<<<<< match >>>>>> callback and deleted from the callback stack. Note that if several identical I<<<<<< match >>>>>> callbacks are defined in different callback groups, ALL of them will be deleted
|
|
from the stack.
|
|
|
|
|
|
=item init_callbacks( $parser )
|
|
|
|
Initializes the callback system for the provided parser before starting a
|
|
parsing process.
|
|
|
|
|
|
=item cleanup_callbacks()
|
|
|
|
Resets global variables and the libxml2 callback stack.
|
|
|
|
|
|
=item lib_init_callbacks()
|
|
|
|
Used internally for callback registration at C-level.
|
|
|
|
|
|
=item lib_cleanup_callbacks()
|
|
|
|
Used internally for callback resetting at the C-level.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE CALLBACKS
|
|
|
|
The following example is a purely fictitious example that uses a
|
|
MyScheme::Handler object that responds to methods similar to an IO::Handle.
|
|
|
|
|
|
|
|
# Define the four callback functions
|
|
sub match_uri {
|
|
my $uri = shift;
|
|
return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs
|
|
}
|
|
|
|
sub open_uri {
|
|
my $uri = shift;
|
|
my $handler = MyScheme::Handler->new($uri);
|
|
return $handler;
|
|
}
|
|
|
|
# The returned $buffer will be parsed by the libxml2 parser
|
|
sub read_uri {
|
|
my $handler = shift;
|
|
my $length = shift;
|
|
my $buffer;
|
|
read($handler, $buffer, $length);
|
|
return $buffer; # $buffer will be an empty string '' if read() is done
|
|
}
|
|
|
|
# Close the handle associated with the resource.
|
|
sub close_uri {
|
|
my $handler = shift;
|
|
close($handler);
|
|
}
|
|
|
|
# Register them with a instance of XML::LibXML::InputCallback
|
|
my $input_callbacks = XML::LibXML::InputCallback->new();
|
|
$input_callbacks->register_callbacks([ \&match_uri, \&open_uri,
|
|
\&read_uri, \&close_uri ] );
|
|
|
|
# Register the callback group at a parser instance
|
|
$parser->input_callbacks( $input_callbacks );
|
|
|
|
# $some_xml_file will be parsed using our callbacks
|
|
$parser->parse_file( $some_xml_file );
|
|
|
|
=head1 AUTHORS
|
|
|
|
Matt Sergeant,
|
|
Christian Glahn,
|
|
Petr Pajas
|
|
|
|
|
|
=head1 VERSION
|
|
|
|
2.0207
|
|
|
|
=head1 COPYRIGHT
|
|
|
|
2001-2007, AxKit.com Ltd.
|
|
|
|
2002-2006, Christian Glahn.
|
|
|
|
2006-2009, Petr Pajas.
|
|
|
|
=cut
|
|
|
|
|
|
=head1 LICENSE
|
|
|
|
This program is free software; you can redistribute it and/or modify it under
|
|
the same terms as Perl itself.
|
|
|