-
-
27 Oct 2007 19:35:43 UTC
- Distribution: KinoSearch
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues (5)
- Testers (105 / 27 / 9)
- Kwalitee
Bus factor: 0- License: perl_5
- Activity
24 month- Tools
- Download (222.15KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors- Marvin Humphrey <marvin at rectangular dot com>
- Dependencies
- Compress::Zlib
- Lingua::Stem::Snowball
- Lingua::StopWords
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
- NAME
- SYNOPSIS
- EXPERIMENTAL API
- DESCRIPTION
- METHODS
- ACCESSOR METHODS
- COPYRIGHT
- LICENSE, DISCLAIMER, BUGS, etc.
Add many tokens to the batch, by supplying the string to be tokenized, and arrays of token starts and token ends (specified in bytes).
NAME
KinoSearch::Analysis::TokenBatch - a collection of tokens
SYNOPSIS
while ( $batch->next ) { $batch->set_text( lc( $batch->get_text ) ); }
EXPERIMENTAL API
TokenBatch's API should be considered experimental and is likely to change.
DESCRIPTION
A TokenBatch is a collection of Tokens which you can add to, then iterate over.
METHODS
new
my $batch = KinoSearch::Analysis::TokenBatch->new;
Constructor.
append
$batch->append( $text, $start_offset, $end_offset, $pos_inc );
Add a Token to the end of the batch. Accepts either three or four arguments: text, start_offset, end_offset, and an optional position increment which defaults to 1 if not supplied. For a description of what these arguments mean, see the docs for Token.
next
while ( $batch->next ) { # ... }
Proceed to the next token in the TokenBatch. Returns true if the TokenBatch ends up located at valid token.
ACCESSOR METHODS
All of TokenBatch's accessor methods affect the current Token. Calling any of these methods when the TokenBatch is not located at a valid Token will trigger an exception.
set_text get_text
Set/get the text of the current Token.
set_start_offset get_start_offset
Set/get the start_offset of the current Token.
set_end_offset get_end_offset
Set/get the end_offset of the current Token.
set_pos_inc get_pos_inc
Set/get the position increment of the current Token.
COPYRIGHT
Copyright 2005-2007 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.162.
Module Install Instructions
To install KinoSearch, copy and paste the appropriate command in to your terminal.
cpanm KinoSearch
perl -MCPAN -e shell install KinoSearch
For more information on module installation, please visit the detailed CPAN module installation guide.