27 Feb 2018 08:40:23 UTC
- Distribution: Lucy
- Source (raw)
- Browse (raw)
- How to Contribute
- Clone repository
- Testers (1263 / 3 / 7)
- KwaliteeBus factor: 1
- License: apache_2_0
- Perl: v5.8.3
- Activity24 month
- Download (1.12MB)
- MetaCPAN Explorer
- Subscribe to distribution
- This version
- Latest version++ed by:6 non-PAUSE usersNWELLNHOF Nick Wellnhoferand 1 contributors
- The Apache Lucy Project <dev at lucy dot apache dot org>
Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping.
my $normalizer = Lucy::Analysis::Normalizer->new; my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [ $tokenizer, $normalizer, $stemmer ], );
Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms. Optionally, it performs Unicode case folding and converts accented characters to their base character.
If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.
my $normalizer = Lucy::Analysis::Normalizer->new( normalization_form => 'NFKC', case_fold => 1, strip_accents => 0, );
Create a new Normalizer.
normalization_form - Unicode normalization form, can be one of ‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’. Defaults to ‘NFKC’.
case_fold - Perform case folding, default is true.
strip_accents - Strip accents, default is false.
my $inversion = $normalizer->transform($inversion);
Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.
inversion - An inversion.
Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Clownfish::Obj.