27 Oct 2007 19:46:10 UTC
- Development release
- Distribution: KinoSearch
- Source (raw)
- Browse (raw)
- How to Contribute
- Issues (5)
- Testers (21 / 5 / 2)
- KwaliteeBus factor: 0
- License: perl_5
- Activity24 month
- Download (460.82KB)
- MetaCPAN Explorer
- Subscribe to distribution
- This version
- Latest versionCREAMYG Marvin Humphreyand 1 contributors
- Marvin Humphrey <marvin at rectangular dot com>
- FACTORY METHODS
- LICENSE, DISCLAIMER, BUGS, etc.
KinoSearch::Index::IndexReader - Read from an inverted index.
my $reader = KinoSearch::Index::IndexReader->open( invindex => MySchema->read('/path/to/invindex'), );
IndexReader is the interface through which Searchers access the content of an InvIndex.
IndexReader objects always represent a snapshot of an invindex as it existed at the moment the reader was created. If you want the search results to reflect modifications to an InvIndex, you must create a new IndexReader after the update process completes.
When a IndexReader is created, a small portion of the InvIndex is loaded into memory; additional sort caches are filled as relevant queries arrive. For large document collections, the warmup time may become noticable, in which case reusing the reader is likely to speed up your search application.
Caching an IndexReader (or a Searcher which contains an IndexReader) is especially helpful when running a high-activity app in a persistent environment, as under mod_perl or FastCGI.
When a file is no longer in use by an index, InvIndexer attempts to delete it as part of a cleanup routine triggered by the call to finish(). It is possible that at the moment an InvIndexer attempts to delete files that it no longer thinks are needed, a Searcher is in fact using them. This is particularly likely in a persistent environment, where Searchers/IndexReaders are cached and reused.
Ordinarily, this is not is not a problem.
On a typical Unix volume, the file will be deleted in name only: any process which holds an open filehandle against that file will continue to have access, and the file won't actually get vaporized until the last filehandle is cleared. Thanks to "delete on last close semantics", an InvIndexer can't truly delete the file out from underneath an active Searcher.
On Windows, KinoSearch will attempt the file deletion, but an error will occur if any process holds an open handle. That's fine; InvIndexer runs these unlink() calls within an eval block, and if the attempt fails it will just try again the next time around.
On NFS, however, the system breaks, because NFS allows files to be deleted out from underneath an active process. Should this happen, the unlucky IndexReader will crash with a "Stale NFS filehandle" exception.
Under normal circumstances, it is neither necessary nor desirable for IndexReaders to secure read locks against an index, but for NFS we have to make an exception. KinoSearch::Store::LockFactory exists for this reason; supplying a LockFactory instance to IndexReader's constructor activates an internal locking mechanism and prevents concurrent indexing processes from deleting files that are needed by active readers.
LockFactory is implemented using lockfiles located in the index directory, so your reader applications must have write access. Stale lock files from crashed processes are ordinarily cleared away the next time the same machine -- as identified by the
agent_idparameter supplied to LockFactory's constrctor -- opens another IndexReader. (The classic technique of timing out lock files does not work because search processes may lie dormant indefinitely.) However, please be aware that if the last thing a given machine does is crash, lock files belonging to it may persist, preventing deletion of obsolete index data.
my $reader = KinoSearch::Index::IndexReader->open( invindex => MySchema->read('/path/to/invindex'), lock_factory => $lock_factory, );
IndexReader is an abstract base class; open() functions like a constructor, but actually returns one of two possible subclasses: SegReader, which reads a single segment, and MultiReader, which channels the output of several SegReaders. Since each segment is a self-contained inverted index, a SegReader is in effect a complete index reader.
open() takes labeled parameters.
invindex - An object which isa KinoSearch::InvIndex.
lock_factory - An object which isa KinoSearch::Store::LockFactory. Read-locking is off by default; supplying
lock_factoryturns it on.
my $max_doc = $reader->max_doc;
Returns one greater than the maximum document number in the invindex.
my $docs_available = $reader->num_docs;
Returns the number of documents currently accessible. Equivalent to max_doc() minus deletions.
Copyright 2005-2007 Marvin Humphrey
See KinoSearch version 0.20.