Text::TokenStream - lexer to break text up into user-defined tokens
my $lexer = Text::TokenStream::Lexer->new(
whitespace => [qr/\s+/],
rules => [
word => qr/\w+/,
sym => qr/[^\w\s]+/,
my $stream = Text::TokenStream->new(
lexer => $lexer,
input => "foo *",
my $tok1 = $stream->next; # --> "word" token containing "foo"
my $tok2 = $stream->next; # --> "sym" token containing "*"
This class is part of a collection of classes that act together to lex (aka scan) an input text into a stream of tokens.
This token stream class provides the stream interface, along with a notion of the "current position" in the input text, and position-aware error reporting. It composes Text::TokenStream::Role::Stream; that role lists the methods this class provides (so that you can easily write a parser class that has a token stream which in turn handles the tokenizer methods).
The basic lexer machinery is found in Text::TokenStream::Lexer; it is separated out from the token stream so that it can be reused across many inputs.
Tokens are instances of a class, Text::TokenStream::Token by default.
This class uses Moo, and inherits the standard new constructor.
An instance of Text::TokenStream::Lexer; required; read-only. Will be used to find tokens in the input.
Str; required; read-only. The text that will be lexed into a stream of tokens.
A Maybe[Path]; read-only. Can be coerced from a string. If a defined value is present, it should contain the name of the file that the input was read from, and that name will be used in any error messages.
The name of a class that inherits from Text::TokenStream::Token; defaults to Text::TokenStream::Token itself; read-only. Tokens found in the input will be constructed as instances of this class.
Takes no arguments. Returns a list of all remaining tokens found in the input.
In the current implementation, this method is provided by Text::TokenStream::Role::Stream.
Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches. Scans through the input until it finds a token that matches the argument, and returns a list of all tokens before the matching one. If no remaining token in the input matches the argument, behaves as "collect_all".
Takes a listified hash of token attributes, and creates a token instance. The token object is created by calling:
If you have particularly complex needs, you may wish to override this method in a subclass.
Takes no arguments. Returns the 0-based position of the first input character that hasn't yet been returned by "next".
Takes multiple arguments, that are concatenated into an error message. (If no arguments are supplied, acts as if you'd supplied the string "Something's wrong".) Throws an exception, reporting the locus of the error as the current input position (using 1-based line and column numbers).
Takes a single positive-integer argument. Attempts to fill an internal buffer of already-lexed tokens so that it contains that many tokens. Returns a boolean that is true iff there were enough tokens to do that.
Takes zero or more arguments, each of which indicates a token to match, as with Text::TokenStream::Token#matches. Returns a boolean that is true iff there's at least one more token in the input, and it matches the argument.
Takes no arguments. Returns the next token found in the input, and advances the current position past it; if no tokens remain, returns undef. The token instance is created by "create_token".
Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches, and an optional string argument describing the current position (for example, "in expression", or "after keyword"). If there are no more tokens in the input, reports an error at the current position, using "err". Otherwise, if the next token doesn't match the argument, reports an error at the position of that token, using "token_err". Otherwise, the next token matches what is being looked for, so that token is returned.
Takes no arguments. Returns the next token that would be returned by "next", but doesn't advance the current input position, and a subsequent "next" call will return the same token.
An internal buffer is used to ensure that every token is lexed only once.
Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches. If there are no more tokens in the input, or the next token doesn't match the argument, returns false; otherwise, advances past the next token, and returns true.
Takes a token as an argument, followed by multiple arguments that are concatenated into an error message. (If no non-token arguments are supplied, acts as if you'd supplied the string "Something's wrong".) Throws an exception, reporting the locus of the error as the position of the token (using 1-based line and column numbers).
Aaron Crane, <email@example.com>
Copyright 2021 Aaron Crane.
This library is free software and may be distributed under the same terms as perl itself. See http://dev.perl.org/licenses/.
To install Text::TokenStream, copy and paste the appropriate command in to your terminal.
perl -MCPAN -e shell
For more information on module installation, please visit the detailed CPAN module installation guide.