Author image Marvin Humphrey
and 1 contributors

NAME

KinoSearch::InvIndexer - Build inverted indexes.

SYNOPSIS

    use KinoSearch::InvIndexer;
    use MySchema;

    my $invindexer = KinoSearch::InvIndexer->new(
        invindex => MySchema->clobber('/path/to/invindex'),
    );

    while ( my ( $title, $content ) = each %source_docs ) {
        $invindexer->add_doc({
            title   => $title,
            content => $content,
        });
    }

    $invindexer->finish;

DESCRIPTION

The InvIndexer class is KinoSearch's primary tool for managing the content of inverted indexes, which may later be searched using KinoSearch::Searcher.

Concurrency

Only one InvIndexer may write to an invindex at a time. If a write lock cannot be secured, new() will throw an exception.

If an index is located on a shared volume, each writer application must identify itself by passing a LockFactory to InvIndexer's constructor or index corruption will occur.

METHODS

new

    my $invindex = MySchema->clobber('/path/to/invindex');
    my $invindexer = KinoSearch::InvIndexer->new(
        invindex     => $invindex,  # required
        lock_factory => $factory    # default: created internally 
    );

Constructor. Takes labeled parameters.

add_doc

    $invindexer->add_doc( { field_name => $field_value } );
    # or ...
    $invindexer->add_doc( { field_name => $field_value }, boost => 2.5 );

Add a document to the invindex. The first argument must be a reference to hash comprised of field_name => field_value pairs. Ownership of the hash is assumed by the InvIndexer object.

After the hashref, labeled parameters are accepted.

  • boost - A scoring multiplier. Setting boost to something other than 1 causes a document to score better or worse against a given query relative to other documents.

add_invindexes

    $invindexer->add_invindexes( $another_invindex, $yet_another_invindex );

Absorb existing invindexes into this one. The other invindexes must use the same Schema as the invindex which was supplied to new().

delete_by_term

    $invindexer->delete_by_term( $field_name, $term_text );

Mark documents which contain the supplied term as deleted, so that they will be excluded from search results. The change is not apparent to search apps until a new Searcher is opened after finish() completes.

If the field is associated with an analyzer, $term_text will be processed automatically (so don't pre-process it yourself).

$field_name must identify an indexed field, or an error will occur.

finish

    $invindexer->finish( 
        optimize => 1, # default: 0
    );

Finish processing any changes made to the invindex and commit. Until the commit happens near the end of the finish(), none of the changes made during an indexing session are permanent.

Calling finish() invalidates the InvIndexer, so if you want to make more changes you'll need a new one.

Takes one labeled parameter:

  • optimize - If optimize is set to 1, the invindex will be collapsed to its most compact form, a process which may take a while -- but which will yield the fastest queries at search time.

COPYRIGHT

Copyright 2005-2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20.