-
-
05 Dec 2006 00:52:44 UTC
- Distribution: KinoSearch
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues (5)
- Testers (34 / 3 / 0)
- Kwalitee
Bus factor: 0- License: perl_5
- Activity
24 month- Tools
- Download (219.83KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors- Marvin Humphrey <marvin at rectangular dot com>
- Dependencies
- Clone
- Compress::Zlib
- Lingua::Stem::Snowball
- Lingua::StopWords
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
KinoSearch::Analysis::Token - unit of text
SYNOPSIS
# private class - no public API
PRIVATE CLASS
You can't actually instantiate a Token object at the Perl level -- however, you can affect individual Tokens within a TokenBatch by way of TokenBatch's (experimental) API.
DESCRIPTION
Token is the fundamental unit used by KinoSearch's Analyzer subclasses. Each Token has 4 attributes: text, start_offset, end_offset, and pos_inc (for position increment).
The text of a token is a string.
A Token's start_offset and end_offset locate it within a larger text, even if the Token's text attribute gets modified -- by stemming, for instance. The Token for "beating" in the text "beating a dead horse" begins life with a start_offset of 0 and an end_offset of 7; after stemming, the text is "beat", but the end_offset is still 7.
The position increment, which defaults to 1, is a an advanced tool for manipulating phrase matching. Ordinarily, Tokens are assigned consecutive position numbers: 0, 1, and 2 for "three blind mice". However, if you set the position increment for "blind" to, say, 1000, then the three tokens will end up assigned to positions 0, 1, and 1001 -- and will no longer produce a phrase match for the query '"three blind mice"'.
COPYRIGHT
Copyright 2006 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.15.
Module Install Instructions
To install KinoSearch, copy and paste the appropriate command in to your terminal.
cpanm KinoSearch
perl -MCPAN -e shell install KinoSearch
For more information on module installation, please visit the detailed CPAN module installation guide.