package Daizu::Preview; use warnings; use strict; use base 'Exporter'; our @EXPORT_OK = qw( output_preview adjust_preview_links_html adjust_preview_links_css adjust_link_for_preview script_link ); use utf8; use HTML::Parser (); use URI; use Daizu::File; use Daizu::HTML qw( html_escape_attr ); use Daizu::Util qw( url_encode db_row_exists db_row_id db_select ); =head1 NAME Daizu::Preview - functions for generating preview versions of output content =head1 DESCRIPTION This code is used by the CGI script C to filter output so that links refer back to the preview. It is this code which makes it possible to preview not only an HTML page, but also get preview versions of all the CSS, images, and linked pages which it references. =head1 CONSTANTS =over =item %PREVIEW_FILTER A hash mapping MIME types (lowercase) to functions which can filter files for previewing. The following functions, defined below, are provided so far: =over =item text/html L =item text/css L =back =cut our %PREVIEW_FILTER = ( 'text/html' => \&adjust_preview_links_html, 'application/xhtml+xml' => \&adjust_preview_links_html, 'text/css' => \&adjust_preview_links_css, ); # TODO document, and provide some way to configure this. our %ENABLE_SSI = ( 'text/html' => undef, 'application/xhtml+xml' => undef, ); =item %HTML_URL_ATTR This hash is used to identify attributes in an HTML document which contain a link which may need to be adjusted to make the preview work (so that for example links to other pages or to embedded images are pointed at the preview versions rather than ones on the live site). Each key is the name of an element and the name of one of its attributes, in lowercase and separated by a colon. The values are either C if the attribute is expected to contain a single URI, or C if it might contain a whitespace-separated list of URIs. This is derived from the HTML 4.01 specification, with a few additional values to support non-standard or obsolete elements and attributes. Note: this information is provided here, rather than using L<%HTML::Tagset::linkElements|HTML::Tagset/hash %HTML::Tagset::linkElements> because that doesn't have enough information. It doesn't distinguish base URIs (which we don't want to change) and it doesn't note whether there can be multiple URIs in an attribute. The C attribute (on the C element) isn't included because the spec says it can be used either as a globally unique ID or as a dereferencable link, so we have to assume that it's already available at the URL. That's fine, because nobody ever uses it. The C element is a URI, but isn't included because it has to point to a C element inside the document. TODO - implement using 'codebase' attribute as base URL. TODO - if using the value of applet:codebase it must be validated to make sure it's a subdirectory of the directory that would contain the current document, for security reasons. See: L =cut our %HTML_URL_ATTR = ( 'a:href' => 'uri', 'applet:archive' => 'uri-list', # relative to applet:codebase 'applet:code' => 'uri', # relative to applet:codebase 'applet:object' => 'uri', # relative to applet:codebase 'area:href' => 'uri', 'blockquote:cite' => 'uri', 'body:background' => 'uri', 'del:cite' => 'uri', 'form:action' => 'uri', 'frame:longdesc' => 'uri', 'frame:src' => 'uri', 'iframe:longdesc' => 'uri', 'iframe:src' => 'uri', 'img:longdesc' => 'uri', 'img:src' => 'uri', 'input:src' => 'uri', 'ins:cite' => 'uri', 'link:href' => 'uri', 'object:codebase' => 'uri', 'object:archive' => 'uri-list', # relative to object:codebase 'object:classid' => 'uri', # relative to object:codebase 'object:data' => 'uri', # relative to object:codebase 'q:cite' => 'uri', 'script:src' => 'uri', # These aren't defined in HTML 4.01, but were added from HTML::Tagset # for compatability with other HTML. 'bgsound:src' => 'uri', 'embed:pluginspage' => 'uri', 'embed:src' => 'uri', 'ilayer:background' => 'uri', 'img:lowsrc' => 'uri', 'isindex:action' => 'uri', 'layer:background' => 'uri', 'layer:src' => 'uri', #'script:for' => 'uri', # XXX - what's this mean? 'table:background' => 'uri', 'td:background' => 'uri', 'th:background' => 'uri', 'tr:background' => 'uri', 'xmp:href' => 'uri', ); =back =head1 FUNCTIONS The following functions are available for export from this module. None of them are exported by default. =over =item output_preview($cms, $url, $file, $generator, $method, $argument, $type, $fh) Generate the output for C<$file> (a L object) which is meant to be published at C<$url> (a simple string or L object). The output will be generated by calling C<$method> on the C<$generator> object, and using C<$argument>. The output will sometimes (depending on the expected MIME type given by C<$type>) be filtered to adjust embedded links so that they point to preview versions instead of the live site. Links will be adjusted if they point to known URLs for the working copy. Other URLs will be made absolute, based on C<$url>. L<%PREVIEW_FILTER|/%PREVIEW_FILTER> is used to determine whether the files need to be filtered, and which function to use for the filtering. The finished (possibly filtered) output is printed to C<$fh>. The file handle will be adjusted with C to expect raw or utf8 output, depending on whether the content type is a text or binary one. =cut sub output_preview { my ($cms, $url, $file, $generator, $method, $argument, $type, $outfh) = @_; $url = URI->new($url) unless ref $url; $type = 'application/octet-stream' unless defined $type; binmode $outfh or die "binmode error: $!"; my $preview_function = $PREVIEW_FILTER{$type}; if ($preview_function) { # Write it to memory so that the URLs can be adjusted. my $content = ''; open my $fh, '>', \$content or die $!; binmode $fh or die "binmode error: $!"; my $url_info = { generator => ref($generator), url => $url, method => $method, argument => $argument, type => $type, fh => $fh, }; $generator->$method($file, [ $url_info ]); if (defined $url_info->{fh}) { close $fh or die $!; } $preview_function->($cms, $file->{wc_id}, $url, $content, $outfh); } else { # Write it directly to the output without filtering. $generator->$method($file, [ { url => $url, method => $method, argument => $argument, type => $type, fh => $outfh, } ]); } } =item adjust_preview_links_html($cms, $wc_id, $base_url, $html, $fh) Given a string containing HTML in C<$html>, parse it and adjust any attributes which are meant to contain URIs to use the correct for of links for a preview. The output is written to C<$fh>. Exactly which attributes are adjusted depends on the contents of L<%HTML_URL_ATTR|/%HTML_URL_ATTR>. In addition, inline CSS code in C