package Daizu::Gen; use warnings; use strict; use Carp qw( croak ); use Carp::Assert qw( assert DEBUG ); use Template; use XML::LibXML; use Compress::Zlib qw( gzopen $gzerrno ); use URI; use Encode qw( decode encode ); use File::Temp qw( tempfile ); use Daizu::TTProvider; use Daizu::HTML qw( dom_body_to_html4 ); use Daizu::Util qw( trim like_escape pgregex_escape w3c_datetime parse_db_datetime db_row_id db_select add_xml_elem xml_attr daizu_data_dir ); =head1 NAME Daizu::Gen - default generator class =head1 DESCRIPTION This class, and subclasses of it, are responsible for deciding which URLs should be created (generated) from each file or directory in a working copy, and generating the output which will be served for those URLs. This class itself is used by default, but you can use a different generator class by setting the C property to the name of a Perl class. If you set it on a file, it will affect only that file. If you set it on a directory then it will affect that directory and all its descendants, unless they themselves have a C property. The name of the generator class used for each file and directory is stored in the C column of the C table in the database. When an object of a generator class is instantiated, it must be given a 'root file', which is the file on which the C property was set (or a top-level file or directory, if no such property applies). This class creates URLs based on the C property, and the names of files and directories. The results will be similar to the URLs that the filesystem would have if they were served directly from a webserver. Files with names like C<_index.html> (anything starting with C<_index> followed by a dot) are special in that the filename will not appear as part of the URL. Instead the URL will end with a trailing slash (C). With this generator class only files generate URLs. Directories are ignored, except when a sitemap XML file is configured as described below. =head1 CONFIGURATION The only configuration information which this generator currently makes use of is the C element shown here: =for syntax-highlight xml The sitemap URL will be generated from the directory at the path indicated. It must be a directory, not a plain file. In this case, the sitemap is likely to have a URL like C. You can give this URL to Google, or any other search engine which supports the sitemaps format, to help their robots find URLs on your website. The C element may an optional C attribute, which should be a relative or absolute URL at which to publish the sitemap file. Its default value is I =head1 SUBCLASSING To write your own generator class, inherit from this one and override some of the following methods: =over =item Lcustom_base_url($file)> If you want to modify the basic URL scheme then you might want to provide your own algorithm for deciding what URLs to use. You could instead override C itself, but usually it's best to leave that alone. It will handle things like URLs explicitly set with the C property, and ignoring things in I<_hide> directories, and just call your C method for the rest. =item Lcustom_urls_info($file)> You would only need to override this if you want to make fairly big changes to the URL scheme. If you just want to change the URLs of a particular type of file then you might be able to do that by overriding one of the simpler C<*_urls_info> functions listed next. The base-class implementation of this function just chooses between. You almost certainly don't want to override Lurls_info($file)>, since that's just a wrapper around this function which tidies up the results. =item Larticle_urls_info($file)>, Lunprocessed_urls_info($file)>, Ldir_urls_info($file)>, Lroot_dir_urls_info($file)> Override one or more of these to change which URLs are produced for particular types of files, such as articles or directories. For example the blog generator overrides C to add URLs for the blog homepage, feeds, etc. =item Larticle_template_overrides($file, $url_info)>, Larticle_template_variables($file, $url_info)> These are called by the Larticle($file, $urls)> method. The base-class ones don't do anything, but you can override them to provide extra information to the templates or to replace a standard template with a different one (if you want to change one aspect of the page structure for your articles). Doing this should allow you to avoid writing your own C
generator method. =item Lnavigation_menu($file, $url)> Override this to change the menu items which will be displayed by the I template. Of course if you want to provide a radically different kind of navigation then you may need to rewrite that template to a different one. If you do that, it's probably a good idea to override this method with one that does no work, to avoid generating menu items which won't be used. =back The constructor can accept additional options, and will just store them in the object hash, so you probably won't need to override that. =head1 METHODS =over =item Daizu::Gen-Enew(%options) Return a new generator object. Requires the following options: =over =item cms A L object. =item root_file A L object for the file on which this generator was specified, or a top-level directory if there was no specification of which generator was in use. So usually this file will have a C property naming this class. =item config_elem The XML DOM node (an L object) of a C element in the Daizu CMS configuration file, or C if there is no appropriate configuration provided. =back =cut sub new { my ($class, %option) = @_; for (qw( cms root_file )) { croak "missing required option '$_'" unless defined $option{$_}; } return bless \%option, $class; } =item $gen-Ebase_url($file) Return a single URL for C<$file>, as a L object. This 'base URL' is typically used as the basis for any other URLs the file might generate. Files with a C property will take that as their base URL. Directories can have base URLs even if they don't actually generate any URLs in the publication process, since those URLs are used to build URLs for any content they contain. Directory URLs end in a forward slash. Files with names starting with I<_index.> have a base URL identical to their parent directory. Returns C if there is no URL for this file. This can happen if the file's name is I<_hide> or I<_template>, or if it is contained in a directory with a name like that, or if there is no C property for the file or any of its ancestors. Subclasses should typically not override this, but instead override Lcustom_base_url($file)>, as the blog generator does for example. =cut sub base_url { my ($self, $file) = @_; croak 'usage: $gen->base_url($file)' unless defined $file; # Files in directories like '_hide' don't have URLs. return undef if $file->{name} =~ /\A(?:$Daizu::HIDING_FILENAMES)\z/o; # URL set with daizu:url property. return URI->new($file->{custom_url}) if defined $file->{custom_url}; # No user-defined URL at top-level. return undef unless defined $file->{parent_id}; return $self->custom_base_url($file); } =item $gen-Ecustom_base_url($file) Override this method in a subclass if you want to use a custom URL scheme, for example one based on publication dates instead of file and directory names. This method is called by Lbase_url($file)>. By the time it has been called, checks have already been done for the C property, the special names like I<_hide>, and the base URL of the parent directory, if any. If these don't determine the URL, or absence of one, then the C method should supply one, or return C if the file shouldn't have a base URL. If this is called then C<$file> is guaranteed to have a parent, but its parent's base URL hasn't been determined, so it may not have one. The default implementation just uses the base URL of the parent and the name of the file or directory in the obvious way. =cut sub custom_base_url { my ($self, $file) = @_; my $parent = $file->parent; my $parent_base = $parent->generator->base_url($parent); return undef unless defined $parent_base; return URI->new($file->{is_dir} ? "$file->{name}/" : $file->{name} =~ /^_index\./ ? '' : $file->{name}) ->abs($parent_base); } =item $gen-Eurls_info($file) Return a list of URLs generated by C<$file> (a L object). May return nothing if the file doesn't generate any URLs. This method calls the Lbase_url($file)> and Lcustom_urls_info($file)> methods to do the actual work. All it does is resolve relative URLs and fill in some missing information, so you're more likely to need to override those two, or one of the C<*_urls_info> methods below, if you want to build a new generator class with a differnet URL scheme. This is what the L generator does. Each URL value returned is actually a reference to a hash containing the following keys, which are all required: =over =item url The actual URL as a L object. This will always be an absolute URL. =item generator The name of the class of generator which was used to create these URLs. =item method The name of the method which should be called to generate the output for this file at this URL. TODO - reference to docs for API of generator methods =item argument Some value which determines exactly which one of a set of URLs of the same basic type this is. For example if there were several URLs for an article, one for each of several pages, then they would probably have the same generator and method, but the page number would be stored as the argument. The argument is always defined. It will be the empty string if Lcustom_urls_info($file)> didn't supply an argument value. =item type The MIME type which the resource should be served with. =back This method returns nothing if the file has no URLs, for example if it has no base URL (which might happen if it is in an I<_hide> directory). =cut sub urls_info { my ($self, $file) = @_; my $base_url = $self->base_url($file); return unless defined $base_url; my @url = $self->custom_urls_info($file); # Resolve relative URLs against the file's base URL, and in the # process turn them into URI objects in case that's useful. # Also store the name of the generator class, and make sure there's # an argument, even if it's just the empty string. for (@url) { assert(defined $_->{url} && defined $_->{method}) if DEBUG; $_->{url} = URI->new_abs($_->{url}, $base_url); $_->{generator} = ref $self unless defined $_->{generator}; $_->{argument} = '' unless defined $_->{argument}; } # Check that the requirements for an article's permalink is met. assert(!$file->{article} || (@url && $url[0]{method} eq 'article' && $url[0]{argument} eq '' && $url[0]{type} eq 'text/html')) if DEBUG; return @url; } =item $gen-Ecustom_urls_info($file) This is called by the Lurls_info($file)> method above, and does the actual work of supplying the URLs. It should also return a list of hashes for the URLs generated by C<$file>, but is allowed to be a bit more lazy. The following are the differences it may make in return value (although note that it is permissible for this method to return exactly the same values as for Lurls_info($file)> if it wishes): =over =item * The C value doesn't have to be an absolute URL, and doesn't have to be a L object. If the URL desired is the same as the value returned by the Lbase_url($file)> method, then this value can simply be the empty string. =item * The C value may be omitted or undefined, in which case it will default to the class name of C<$gen>. =item * The C value may be omitted or undefined, in which case it will default to the empty string. =back The Daizu::Gen implementation of the method simply calls the four C<*_urls_info> methods listed next as appropriate, so usually subclasses should override those instead of this method. =cut sub custom_urls_info { my ($self, $file) = @_; my @urls; if ($file->{is_dir}) { push @urls, $self->root_dir_urls_info($file) if $file->{id} == $self->{root_file}{id}; push @urls, $self->dir_urls_info($file); } else { if ($file->{article}) { push @urls, $self->article_urls_info($file); } else { push @urls, $self->unprocessed_urls_info($file); } } return @urls; } =item $gen-Earticle_urls_info($file) Return a list of URLs for an article. C<$file> must be a L object for a file which is an article. Uses the Larticle_urls> to do the work, so this is just a simple wrapper to allow subclasses to override it. The return value is as specified for Lcustom_urls_info($file)>. =cut sub article_urls_info { my ($self, $file) = @_; return $file->article_urls; } =item $gen-Eunprocessed_urls_info($file) Return a list of URLs for the non-article non-directory file in C<$file>, which must be a L object. This base-class implementation returns a single URL which uses the Lunprocessed($file, $urls)> in this class. The return value is as specified for Lcustom_urls_info($file)>. The content type, if not defined by the file, will default to C. =cut sub unprocessed_urls_info { my ($self, $file) = @_; my $type = $file->{content_type}; $type = 'application/octet-stream' unless defined $type; return { generator => 'Daizu::Gen', url => '', method => 'unprocessed', type => $type, }; } =item $gen-Edir_urls_info($file) Return a list of URLs for the directory specified by C<$file>, which should be a L object. This base-class implementation returns no URLs. The return value is as specified for Lcustom_urls_info($file)>. =cut sub dir_urls_info { () } =item $gen-Eroot_dir_urls_info($file) Return a list of URLs for the directory C<$file>, which should be a L object for the root directory of the generator (the directory which has the C property or a top-level directory). This base-class implementation returns no URLs unless the configuration specifies that an XML sitemap should be published, in which case it returns a single URL for the sitemap file, using the Lxml_sitemap($file, $urls)>. If a file, rather than a directory, has a C property, then this method isn't called and the file isn't distinguished in any way for being the 'root file'. The return value is as specified for Lcustom_urls_info($file)>. If you override this to add other URLs you can still allow sitemaps to be published from the root directory by calling the superclass version, like this: =for syntax-highlight perl sub root_dir_urls_info { my ($self, $file) = @_; my @url = $self->SUPER::root_dir_urls_info($file); # Add your own URLs here: push @url, { ... }; return @url; } =cut sub root_dir_urls_info { my ($self, $file) = @_; my $conf = $self->{config_elem}; return unless defined $conf; my ($elem, $extra) = $conf->getChildrenByTagNameNS($Daizu::CONFIG_NS, 'xml-sitemap'); return unless defined $elem; my $config_filename = $self->{cms}{config_filename}; die "$config_filename: only one XML sitemap allowed on $file->{path}" if defined $extra; my $url = trim(xml_attr($config_filename, $elem, 'url', 'sitemap.xml.gz')); return { url => $url, method => 'xml_sitemap', type => 'application/xml', }; } =item $gen-Egenerate_web_page($file, $url, $template_overrides, $template_vars) Use L