Author image Marvin Humphrey
and 1 contributors

NAME

KinoSearch::Analysis::Analyzer - Base class for analyzers.

SYNOPSIS

    # abstract base class -- must be subclassed

    package MyAnalyzer;

    sub analyze {
        my ( $self, $token_batch ) = @_;

        while ( my $token = $token_batch->next ) {
            my $new_text = transform( $token->get_text );
            $token->set_text($new_text);
        }

        return $token_batch;
    }

    sub transform {
        # ...
    }

DESCRIPTION

In KinoSearch, an Analyzer is a filter which processes text, transforming it from one form into another. For instance, an analyzer might break up a long text into smaller pieces (Tokenizer), or it might convert text to lowercase (LCNormalizer).

SUBCLASSING

All Analyzer subclasses must provide an analyze method.

analyze

analyze() takes a single TokenBatch as input, and it returns a TokenBatch, either the same one (presumably transformed in some way), or a new one.

COPYRIGHT

Copyright 2005-2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20.