NAME

Text::Tokenizer - Perl extension for tokenizing text(config) files

SYNOPSIS

  use Text::Tokenizer ':all';

  #open file and set add it to tokenizer inputs
  open(F_CONFIG, "input.conf");	
  $tok_id	= tokenizer_new(F_CONFIG);
  tokenizer_options(TOK_OPT_NOUNESCAPE|TOK_OPT_PASSCOMMENT);

  while(1)
  {
	($string, $tok_type, $line, $err, $errline)	= tokenizer_scan();
	last if($tok == TOK_ERROR || $tok == TOK_EOF);

	if($tok == TOK_TEXT)		{ 	}
	elsif($tok == TOK_BLANK)	{ 	}
	elsif($tok == TOK_DQUOTE)	{ $str	= "\"$str\"";	}
	elsif($tok == TOK_SQUOTE)	{ $str	= "\'$str\'";	}
	elsif($tok == TOK_SIQUOTE)	{ $str	= "\`$str\'";	}
	elsif($tok == TOK_IQUOTE)	{ $str	= "\`$str\`";	}
	elsif($tok == TOK_EOL)		{	}
	elsif($tok == TOK_COMMENT)	{	}
	elsif($tok == TOK_UNDEF)
		{ last;		}
	else	{ last;	};
	print $str;
  }
  tokenizer_delete($tokid);

DESCRIPTION

Text::Tokenizer is very fast lexical analyzer, that can be used to process input text from file or buffer to basic tokens:

NORMAL TEXT
DOUBLE QUOTED "TEXT"
SINGLE QUOTED 'TEXT'
INVERSE QUOTED 'TEXT'
SINGLE-INVERSE QUOTED `TEXT'
WHITESPACE TEXT
#COMMENTS
END OF LINE
END OF FILE

EXPORT

None by default. You have to selectively import methods or constants or use ':all' to import all constants & methods.

CONSTANTS

TOKEN TYPES Token types that tokenizer returns.

TOK_UNDEF: Undefined token (tokenizer error)
TOK_TEXT: Normal_text
TOK_DQUOTE: "Double quoted text"
TOK_SQUOTE: 'Single quoted text'
TOK_IQUOTE: `Inverse quoted text`
TOK_SIQUOTE: `Single-inverse quoted text'
TOK_BLANK: Whitespace text
TOK_COMMENT: #Comment
TOK_EOL: End of Line
TOK_EOF: End of File
TOK_ERROR: Error Condition (see ERROR_TYPES)

ERROR TYPES Error codes that will tokenizer return if error happens.

NOERR: No error
UNCLOSED_DQUOTE: Unclosed double quote found
UNCLOSED_SQUOTE: Unclosed single quote found
UNCLOSED_IQUOTE: Unclosed inverse quote found
NOCONTEXT: Failed to allocate tokenizer context (FATAL ERROR)

TOKENIZER OPTIONS Options configurable for tokenizer. They should be OR-ed when passing to tokenizer_options.

TOK_OPT_DEFAULT: Default options set, equals to TOK_OPT_NOUNESCAPE
TOK_OPT_NONE: Set no options. Tokenizer will do in it's default behaviour - it will not unescape anything and it will not pass comments to you.
TOK_OPT_NOUNESCAPE: Disable characters & lines unescaping.
TOK_OPT_SIQUOTE: Enable looking for `single-inverse quote' combination.
TOK_OPT_UNESCAPE: Unescape chars & lines.
TOK_OPT_UNESCAPE_CHARS: Unescape chars
TOK_OPT_UNESCAPE_LINES: Unescape lines
TOK_OPT_PASSCOMMENT: Enable comment passing to user routines.

METHODS

$options = tokenizer_options(OPTIONS)

Set tokenizer options.

$tok_id = tokenizer_new(FILE_HANDLE)

Create new tokenizer instance(context) from FILE_HANDLE identified by $tok_id.

$tok_id = tokenizer_new_strbuf(BUFFER, LENGTH)

Create new tokenizer instance from string BUFFER long LENGTH characters. Return its tokenizer instance id.

@tok = tokenizer_scan()

Scan current tokenizer instance, and return first token found. @tok = ($string, $type, $line, $error, $error_line)

$string - found token string
$type - it's type
$line - current line
$error - equals error code if error occurs
$error_line - line number where error begins (unclosed quote position)

tokenizer_exists(TOK_ID)

Test if tokenizer instance exists.

tokenizer_switch(TOK_ID)

Switch to another tokenizer instance (like when you perform include statment).

tokenizer_delete(TOK_ID)

Delete tokenizer instance (You have to do it exactly on EOF to release connection between file or buffer.

tokenizer_flush(TOK_ID)

Flush tokenizer instance. This function discards the instance buffer's contents, so the next time the scanner attempts to match a token from the buffer, it will have to fill it.

AUTHOR

Samuel Behan, <sam@frida.fri.utc.sk>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms of GNU/GPL v2.

8 POD Errors

The following errors were encountered while parsing the POD:

Around line 166:: You forgot a '=back' before '=head2'
Around line 168:: '=item' outside of any '=over'
Around line 213:: You forgot a '=back' before '=head2'
Around line 215:: '=item' outside of any '=over'
Around line 236:: You forgot a '=back' before '=head2'
Around line 238:: '=item' outside of any '=over'
Around line 298:: =over should be: '=over' or '=over positive_number'
Around line 311:: =back doesn't take any parameters, but you said =back =back

To install Text::Tokenizer, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::Tokenizer

CPAN shell

perl -MCPAN -e shell
install Text::Tokenizer

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)