-
-
11 Mar 2006 05:47:55 UTC
- Distribution: KinoSearch
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues (5)
- Testers (4 / 3 / 0)
- Kwalitee
Bus factor: 0- License: perl_5
- Activity
24 month- Tools
- Download (169.28KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors- Marvin Humphrey <marvin at rectangular dot com>
- Dependencies
- Clone
- Lingua::Stem::Snowball
- Lingua::StopWords
- Sort::External
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
KinoSearch::InvIndexer - build inverted indexes
WARNING
KinoSearch is alpha test software. The API and the file format are subject to change.
SYNOPSIS
use KinoSearch::InvIndexer; use KinoSearch::Analysis::PolyAnalyzer; my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' ); my $invindexer = KinoSearch::InvIndexer->new( invindex => '/path/to/invindex', create => 1, analyzer => $analyzer, ); $invindexer->spec_field( name => 'title' boost => 3, ); $invindexer->spec_field( name => 'bodytext' ); while ( my ( $title, $bodytext ) = each %source_documents ) { my $doc = $invindexer->new_doc($title); $doc->set_value( title => $title ); $doc->set_value( bodytext => $bodytext ); $invindexer->add_doc($doc); } $invindexer->finish;
DESCRIPTION
The InvIndexer class is KinoSearch's primary tool for creating and modifying inverted indexes, which may be searched using KinoSearch::Searcher.
METHODS
new
my $invindexer = KinoSearch::InvIndexer->new( invindex => '/path/to/invindex', # required create => 1, # default: 0 analyzer => $analyzer, # default: no-op Analyzer );
Create an InvIndexer object.
invindex - can be either a filepath, or an InvIndex subclass such as KinoSearch::Store::FSInvIndex or KinoSearch::Store::RAMInvIndex.
create - create a new invindex, clobbering an existing one if necessary.
analyzer - an object which subclasses KinoSearch::Analysis::Analyzer, such as a PolyAnalyzer.
spec_field
$invindexer->spec_field( name => 'url', # required boost => 1, # default: 1, analyzer => undef, # default: analyzer spec'd in new() indexed => 1, # default: 1 analyzed => 0, # default: 1 stored => 0, # default: 1 compressed => 0, # default: 0 vectorized => 0, # default: see below );
Define a field. This is analogous to defining a field in a database.
name - the field's name.
boost - A multiplier which determines how much a field contributes to a document's score.
analyzer - By default, all indexed fields are analyzed using the analyzer that was supplied to new(). Supplying an alternate for a given field overrides the primary analyzer.
indexed - index the field, so that it can be searched later.
analyzed - analyze the field, using the relevant Analyzer. Fields such as "category" or "product_number" might be indexed but not analyzed.
stored - store the field, so that it can be retrieved when the document turns up in a search.
compressed - compress the stored field, using the zlib compression algorithm.
vectorized - store the fields "term vectors", which are required by KinoSearch::Highlight::Highlighter for excerpt selection and search term highlighting. By default, if a field is marked as
stored
, it will be vectorized as well.
new_doc
my $doc = $invindexer->new_doc;
Spawn an empty KinoSearch::Document::Doc object, primed to accept values for the fields spec'd by spec_field.
add_doc
$invindexer->add_doc($doc);
Add a document to the invindex.
finish
$invindexer->finish;
Finish the invindex. Invalidates the InvIndexer.
COPYRIGHT
Copyright 2005-2006 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.08.
Module Install Instructions
To install KinoSearch, copy and paste the appropriate command in to your terminal.
cpanm KinoSearch
perl -MCPAN -e shell install KinoSearch
For more information on module installation, please visit the detailed CPAN module installation guide.