James E Keenan
and 1 contributors

NAME

Parse::File::Metadata - For plain-text files that contain both metadata and data records, parse metadata first

SYNOPSIS

     use Parse::File::Metadata;

    $metaref = {};
    @rules = (
        {
            rule => sub { exists $metaref->{d}; },
            label => q{'d' key must exist},
        },
        {
            rule => sub { $metaref->{d} =~ /^\d+$/; },
            label => q{'d' key must be non-negative integer},
        },
        {
            rule => sub { exists $metaref->{f}; },
            label => q{'f' key must exist},
        },
    );

    $self = Parse::File::Metadata->new( {
        file            => 'path/to/myfile',
        header_split    => '\s*=\s*',
        metaref         => $metaref,
        rules           => \@rules,
    } );

    $dataprocess = sub { my @fields = split /,/, $_[0], -1; print "@fields\n"; };

    $self->process_metadata_and_proceed( $dataprocess );

    $self->process_metadata_only();

    $metadata_out = $self->get_metadata();

    $exception = $self->get_exception();

DESCRIPTION

This module is useful when you have to parse a plain-text file that meets the following conditions:

  • The file consists of two types of records:

    • A header section consisting of key-value pairs which constitute, in some sense, metadata.

    • A body section consisting mainly or entirely of data records, which may be either delimited or fixed-width.

    • The header and the body are separated by one or more empty records.

  • Your program must parse the metadata first, then make a decision on the basis of the metadata whether to proceed with parsing of the data. The metadata may or may not be used in the parsing of the data.

Example

Below is a plain-text file in which the header consists of key-value pairs delimited by = signs. The key is the to the left of the first delimiter. Everything to the right is part of the value (including any additional delimiter characters).

The body consists of comma-delimited strings. Whether in the body or the header, comments begin with a # sign and are ignored.

    # comment
    a=alpha
    b=beta,charlie,delta
    c=epsilon   zeta    eta
    d=1234567890
    e=This is a string
    f=,
    
    some,body,loves,me
    I,wonder,wonder,who
    could,it,be,you

Suppose you are told that you should proceed to parse the body if and only if the following conditions are met in the header:

  • There must be a metadata element keyed on d.

  • The value of metadata element d must be a non-negative integer.

  • There must be a metadata element keyed on f.

This file would meet all three criteria and the program would proceed to parse the three data records.

If, however, metadata element f were commented out:

    #f=,

the file would no longer meet the criteria and the program would cease before parsing the data records.

METHODS

new()

  • Purpose

    Parse::File::Metadata constructor. Validates input.

  • Arguments

        $self = Parse::File::Metadata->new( {
            file            => 'path/to/myfile',
            header_split    => '\s*=\s*',
            metaref         => $metaref,
            rules           => \@rules,
        } );

    Single hash reference. Hash has the following elements:

    • file

      Path, relative or absolute, to the file needing parsing.

    • header_split

      Hard-quoted string holding a Perl 5 regex to be used for parsing metadata records.

    • metaref

      Empty hash-reference.

    • rules

      Reference to an array of hashrefs. Each such hashref has two elements:

      • rule

        Reference to a subroutine describing a criterion which the header must pass before parsing of the body begins. The subroutine returns a true value when the criterion is met and an undefined value when the criterion is not met.

      • label

        A human-friendly string which will be used to populate exceptions if the criteria are not met.

      The rules are applied in the order specified in the array.

  • Return Value

    Parse::File::Metadata object.

process_metadata_and_proceed()

  • Purpose

    Process metadata rows found in file header and test the resulting hash against the criteria specified in the rules. If all criteria are met, proceed to parse the data rows with the subroutine specified as argument to this method.

  • Arguments

        $dataprocess = sub { my @fields = split /,/, $_[0], -1; print "@fields\n"; };
    
        $self->process_metadata_and_proceed( $dataprocess );
  • Return Values

    None. Use get_metadata() and get_exception() methods to obtain that data.

process_metadata_only()

  • Purpose

    Same as process_metadata_and_proceed, except that it returns before beginning any processing of the data records.

  • Arguments

        $self->process_metadata_only();
  • Return Values

    None.

get_metadata()

  • Purpose

    Access metadata in file's header section.

  • Arguments

        $metadata_out = $self->get_metadata()

    None.

  • Return Values

    Hash of metadata found in file's header.

get_exception()

  • Purpose

    Access reasons, if any, why file failed to meet specified criteria.

  • Arguments

        $exception = $self->get_exception()

    None.

  • Return Values

    Reference to an array holding lists of labels for rules on which the metadata fails.

SUPPORT

https://rt.cpan.org

AUTHOR

    James E Keenan
    CPAN ID: jkeenan
    Perl Seminar NY
    jkeenan@cpan.org
    http://thenceforward.net/perl/modules/Parse-File-Metadata

COPYRIGHT

Copyright 2010 James E Keenan

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

perl(1).