-
-
27 Feb 2018 08:40:23 UTC
- Distribution: Lucy
- Source (raw)
- Browse (raw)
- Changes
- Homepage
- How to Contribute
- Clone repository
- Issues
- Testers (1263 / 3 / 7)
- Kwalitee
Bus factor: 1- License: apache_2_0
- Perl: v5.8.3
- Activity
24 month- Tools
- Download (1.12MB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors- The Apache Lucy Project <dev at lucy dot apache dot org>
- Dependencies
- Clownfish
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
gen_word_break_data.pl - Generate word break table and tests
SYNOPSIS
perl gen_word_break_data.pl [-c] UCD_SRC_DIR
DESCRIPTION
This script generates the tables to lookup Unicode word break properties for the StandardTokenizer. It also converts the word break test suite in the UCD to JSON.
UCD_SRC_DIR should point to a directory containing the files WordBreakProperty.txt, WordBreakTest.txt, and DerivedCoreProperties.txt from the Unicode Character Database available at http://www.unicode.org/Public/6.3.0/ucd/.
OUTPUT FILES
modules/unicode/ucd/WordBreak.tab modules/unicode/ucd/WordBreakTest.json
OPTIONS
-c
Show total table size for different shift values
Module Install Instructions
To install Lucy, copy and paste the appropriate command in to your terminal.
cpanm Lucy
perl -MCPAN -e shell install Lucy
For more information on module installation, please visit the detailed CPAN module installation guide.