BuzzSaw::DataSource - A Moose role which defines the BuzzSaw data source interface


This documentation refers to BuzzSaw::DataSource version 0.12.0


package BuzzSaw::DataSource::Example; use Moose;

with 'BuzzSaw::DataSource';

sub next_entry { my ($self) = @_; .... return $line; }

sub reset { my ($self) = @_; .... }


This is a Moose role which defines the methods which must be implemented by any BuzzSaw data source class. It also provides a number of common attributes which all data sources will require. A data source is literally what the name implies, the class provides a standard interface to any set of log data. A data source has a parser associated with it which is known to be capable of parsing the particular format of data found within this source. Note that this means that different types of log files (e.g. syslog, postgresql and apache) must be represented by different resources even though they are all sets of files. There is no requirement that the data be stored in files, it would be just as easy to store and retrieve it from a database. As long as the data source returns data in the same way, one complete entry at a time, it will work. A BuzzSaw data source is expected to work like a stream. Each time the next entry is requested the method should automatically move on until all entries in all resources are exhausted. For example, the Files data source automatically moves on from one file to another whenever the end-of-file is reached.

The BuzzSaw project provides a suite of tools for processing log file entries. Entries in files are parsed and filtered into a set of events of interest which are stored in a database. A report generation framework is also available which makes it easy to generate regular reports regarding the events discovered.


The following atributes are common to all classes which implement this interface.


This attribute holds a reference to the BuzzSaw::DB object. When the DataSource object is created you can pass in a string which is treated as a configuration file name, this is used to create the BuzzSaw::DB object via the new_with_config class method. Alternatively, a hash can be given which is used as the set of parameters with which to create the new BuzzSaw::DB object.


This attribute holds a reference to an object of a class which implements the BuzzSaw::Parser role. If a string is passed in then it is considered to be a class name in the BuzzSaw::Parser namespace, short names are allowed, e.g. passing in RFC3339 would result in a new BuzzSaw::Parser::RFC3339 object being created.


This is a boolean value which controls whether or not all files should be read. If it is set to true (i.e. a value of 1 - one) then the code which normally attempts to avoid re-reading previously seen files will not be used. The default value is false (i.e. a value of 0 - zero).


Any class which implements this role must provide the following two methods.

$entry = $source->next_entry

This method returns the next entry from the stream of log entries as a simple string. For example, with the Files data source - which works through all lines in a set of files - this will return the next line in the file.

This method should use the BuzzSaw::DB object start_processing and register_log methods to avoid re-reading sources (unless the readall attribute is true). It is also expected to begin and end DB transactions at appropriate times. For example, the Files data source starts a transaction when a file is opened and ends the transaction when the file is closed. This is designed to strike a balance between efficiency and the need to commit regularly to avoid the potential for data loss.

Note that this method does NOT return a parsed entry, it returns the simple string which is the next single complete log entry. When the data source is exhausted it will return the undef value.


This method must reset the position of all (if any) internal iterators to their initial values. This then leaves the data source back at the original starting position. Note that this does not imply that a second parsing would be identical to the first (e.g. files may have disappeared in the meantime).

The following methods are provided as they are commonly useful to most possible data sources.

$sum = $source->checksum_file($file)

This returns a string which is the base-64 encoded SHA-256 digest of the contents of the specified file.

$sum = $source->checksum_data($data)

This returns a string which is the base-64 encoded SHA-256 digest of the specified data.


This module is powered by Moose, it also requires MooseX::Types, MooseX::Log::Log4perl and MooseX::SimpleConfig.

The Digest::SHA module is also required.


BuzzSaw, BuzzSaw::DataSource::Files, DataSource::Importer, BuzzSaw::DB, BuzzSaw::Parser


This is the list of platforms on which we have tested this software. We expect this software to work on any Unix-like platform which is supported by Perl.



Please report any bugs or problems (or praise!) to, feedback and patches are also always very welcome.


    Stephen Quinney <>


    Copyright (C) 2012 University of Edinburgh. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the terms of the GPL, version 2 or later.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 206:

You forgot a '=back' before '=head1'