NAME

Text::Tokenizer - Perl extension for tokenizing text(config) files

SYNOPSIS

  use Text::Tokenizer ':all';

  #open file and set add it to tokenizer inputs
  open(F_CONFIG, "input.conf") || die("failed to open input.conf");
  $tok_id	= tokenizer_new(F_CONFIG);
  tokenizer_options(TOK_OPT_NOUNESCAPE|TOK_OPT_PASSCOMMENT);

  while(1)
  {
	($string, $tok_type, $line, $err, $errline)	= tokenizer_scan();
	last if($tok_type == TOK_ERROR || $tok_type == TOK_EOF);

	if($tok_type == TOK_TEXT)	{ 	}
	elsif($tok_type == TOK_BLANK)	{ 	}
	elsif($tok_type == TOK_DQUOTE)	{ $string	= "\"$str\"";	}
	elsif($tok_type == TOK_SQUOTE)	{ $string	= "\'$str\'";	}
	elsif($tok_type == TOK_SIQUOTE)	{ $string	= "\`$str\'";	}
	elsif($tok_type == TOK_IQUOTE)	{ $string	= "\`$str\`";	}
	elsif($tok_type == TOK_EOL)	{ $string	= "\n";		}
	elsif($tok_type == TOK_COMMENT)	{	}
	elsif($tok_type == TOK_UNDEF)
		{ last;	}
	else	{ last;	};
	print $string;
  }
  tokenizer_delete($tok_id);


  Very complex example of using Text::Tokenizer can be found in passwd_exp - tool for password
  expiration notification (http://devel.dob.sk/passwd_exp)

DESCRIPTION

Text::Tokenizer is very fast lexical analyzer, that can be used to process input text from file or buffer to basic tokens:

NORMAL TEXT
DOUBLE QUOTED "TEXT"
SINGLE QUOTED 'TEXT'
INVERSE QUOTED 'TEXT'
SINGLE-INVERSE QUOTED `TEXT'
WHITESPACE TEXT
#COMMENTS
END OF LINE
END OF FILE

EXPORT

None by default. You have to selectively import methods or constants or use ':all' to import all constants & methods.

CONSTANTS

TOKEN TYPES Token types that tokenizer returns.

TOK_UNDEF: Undefined token (tokenizer error)
TOK_TEXT: Normal_text
TOK_DQUOTE: "Double quoted text"
TOK_SQUOTE: 'Single quoted text'
TOK_IQUOTE: `Inverse quoted text`
TOK_SIQUOTE: `Single-inverse quoted text'
TOK_BLANK: Whitespace text
TOK_COMMENT: #Comment
TOK_EOL: End of Line
TOK_EOF: End of File
TOK_ERROR: Error Condition (see ERROR_TYPES)

ERROR TYPES Error codes that will tokenizer return if error happens.

NOERR: No error
UNCLOSED_DQUOTE: Unclosed double quote found
UNCLOSED_SQUOTE: Unclosed single quote found
UNCLOSED_IQUOTE: Unclosed inverse quote found
NOCONTEXT: Failed to allocate tokenizer context (FATAL ERROR)

TOKENIZER OPTIONS Options configurable for tokenizer. They should be OR-ed when passing to tokenizer_options.

TOK_OPT_DEFAULT: Default options set, equals to TOK_OPT_NOUNESCAPE
TOK_OPT_NONE: Set no options. Tokenizer will do in it's default behaviour - it will not unescape anything and it will not pass comments to you.
TOK_OPT_NOUNESCAPE: Disable characters & lines unescaping.
TOK_OPT_SIQUOTE: Enable looking for `single-inverse quote' combination.
TOK_OPT_UNESCAPE: Unescape chars & lines.
TOK_OPT_UNESCAPE_CHARS: Unescape chars (inside of quotes only)
TOK_OPT_UNESCAPE_LINES: Unescape lines (inside of quotes only)
TOK_OPT_PASSCOMMENT: Enable comment passing to user routines.
TOK_OPT_UNESCAPE_NQ_LINES: Unescape lines (outside of quotes). Escaped end of line will not terminate value processing processing. So escaped multiline text will be returned as single line string.

METHODS

$options = tokenizer_options(OPTIONS)

Set tokenizer options.

$tok_id = tokenizer_new(FILE_HANDLE)

Create new tokenizer instance(context) from FILE_HANDLE identified by $tok_id.

$tok_id = tokenizer_new_strbuf(BUFFER, LENGTH)

Create new tokenizer instance from string BUFFER long LENGTH characters. Return its tokenizer instance id.

@tok = tokenizer_scan()

Scan current tokenizer instance, and return first token found. @tok = ($string, $type, $line, $error, $error_line)

$string - found token string
$type - it's type
$line - current line
$error - equals error code if error occurs
$error_line - line number where error begins (unclosed quote position)

tokenizer_exists(TOK_ID)

Test if tokenizer instance exists.

tokenizer_switch(TOK_ID)

Switch to another tokenizer instance (like when you perform include statement).

tokenizer_delete(TOK_ID)

Delete tokenizer instance You have to do it exactly on EOF to release tokenizer reference to file or buffer.

tokenizer_flush(TOK_ID)

Flush tokenizer instance. This function discards the instance buffer\s contents, so the next time the scanner attempts to match a token from the buffer, it will have to fill it.

AUTHOR

Samuel Behan, (http://devel.dob.sk)

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the terms of GNU/GPL v3.

To install Text::Tokenizer, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::Tokenizer

CPAN shell

perl -MCPAN -e shell
install Text::Tokenizer

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)