The Perl Advent Calendar needs more articles for 2022. Submit your idea today!


Daizu::Preview - functions for generating preview versions of output content


This code is used by the CGI script preview.cgi to filter output so that links refer back to the preview. It is this code which makes it possible to preview not only an HTML page, but also get preview versions of all the CSS, images, and linked pages which it references.



A hash mapping MIME types (lowercase) to functions which can filter files for previewing. The following functions, defined below, are provided so far:






This hash is used to identify attributes in an HTML document which contain a link which may need to be adjusted to make the preview work (so that for example links to other pages or to embedded images are pointed at the preview versions rather than ones on the live site).

Each key is the name of an element and the name of one of its attributes, in lowercase and separated by a colon. The values are either uri if the attribute is expected to contain a single URI, or uri-list if it might contain a whitespace-separated list of URIs.

This is derived from the HTML 4.01 specification, with a few additional values to support non-standard or obsolete elements and attributes.

Note: this information is provided here, rather than using %HTML::Tagset::linkElements because that doesn't have enough information. It doesn't distinguish base URIs (which we don't want to change) and it doesn't note whether there can be multiple URIs in an attribute.

The profile attribute (on the head element) isn't included because the spec says it can be used either as a globally unique ID or as a dereferencable link, so we have to assume that it's already available at the URL. That's fine, because nobody ever uses it.

The usemap element is a URI, but isn't included because it has to point to a map element inside the document.

TODO - implement using 'codebase' attribute as base URL.

TODO - if using the value of applet:codebase it must be validated to make sure it's a subdirectory of the directory that would contain the current document, for security reasons. See:


The following functions are available for export from this module. None of them are exported by default.

output_preview($cms, $url, $file, $generator, $method, $argument, $type, $fh)

Generate the output for $file (a Daizu::File object) which is meant to be published at $url (a simple string or URI object). The output will be generated by calling $method on the $generator object, and using $argument.

The output will sometimes (depending on the expected MIME type given by $type) be filtered to adjust embedded links so that they point to preview versions instead of the live site. Links will be adjusted if they point to known URLs for the working copy. Other URLs will be made absolute, based on $url.

%PREVIEW_FILTER is used to determine whether the files need to be filtered, and which function to use for the filtering.

The finished (possibly filtered) output is printed to $fh. The file handle will be adjusted with binmode to expect raw or utf8 output, depending on whether the content type is a text or binary one.

Given a string containing HTML in $html, parse it and adjust any attributes which are meant to contain URIs to use the correct for of links for a preview. The output is written to $fh.

Exactly which attributes are adjusted depends on the contents of %HTML_URL_ATTR.

In addition, inline CSS code in style elements is filtered though the CSS filtering function described below, so that CSS links are adjusted as well.

Filter CSS (cascading style sheet) code in $css replacing links with ones which point to the preview (if appropriate) or are absolute. This means that if your CSS file references background images, or includes other stylesheets, it will still work while previewing output.

The filtering is done with a simple lexical analyser, which looks for url() values and @import commands. It knows enough to skip over string literals and comments which happen to contain things which might look like these, but it doesn't make any great effort to understand the CSS syntax.

Called by the filtering functions above to adjust a link.

$value_type should be either uri if $urls is expected to contain a single URI, or uri-list if it might contain a whitespace-separated list of URIs.

Returns a replacement for the value in $urls, which can be substituted back into the filtered content.

script_link($cms, $wc_id, %args)

Return a properly encoded URL with query parameters which refers to the current CGI script (based on the SCRIPT_NAME environment variable). The keys and values in %args will be given as CGI parameters.

If $wc_id is provided, and there is no wc argument in %args, then a wc argument may be added automatically. It's assumed that this argument will default to the live working copy ID, so it isn't added if $wc_id is the same as that.


This software is copyright 2006 Geoff Richards <>. For licensing information see this page: