-
-
04 Sep 2014 00:12:31 UTC
- Distribution: Lucy
- Source (raw)
- Browse (raw)
- Changes
- Homepage
- How to Contribute
- Clone repository
- Issues
- Testers (2 / 91 / 0)
- Kwalitee
Bus factor: 1- License: apache_2_0
- Perl: v5.8.3
- Activity
24 month- Tools
- Download (1.06MB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors- The Apache Lucy Project <dev at lucy dot apache dot org>
- Dependencies
- Clownfish
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping.
SYNOPSIS
my $normalizer = Lucy::Analysis::Normalizer->new; my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [ $tokenizer, $normalizer, $stemmer ], );
DESCRIPTION
Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms. Optionally, it performs Unicode case folding and converts accented characters to their base character.
If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.
CONSTRUCTORS
new( [labeled params] )
my $normalizer = Lucy::Analysis::Normalizer->new( normalization_form => 'NFKC', case_fold => 1, strip_accents => 0, );
normalization_form - Unicode normalization form, can be one of 'NFC', 'NFKC', 'NFD', 'NFKD'. Defaults to 'NFKC'.
case_fold - Perform case folding, default is true.
strip_accents - Strip accents, default is false.
INHERITANCE
Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Clownfish::Obj.
Module Install Instructions
To install Lucy::Simple, copy and paste the appropriate command in to your terminal.
cpanm Lucy::Simple
perl -MCPAN -e shell install Lucy::Simple
For more information on module installation, please visit the detailed CPAN module installation guide.