NAME

Text::Tokenizer - Perl extension for tokenizing text(config) files

SYNOPSIS

  use Text::Tokenizer ':all';

  #open file and set add it to tokenizer inputs
  open(F_CONFIG, "input.conf");	
  $tok_id	= tokenizer_new(F_CONFIG);
  tokenizer_options(TOK_OPT_NOUNESCAPE|TOK_OPT_PASSCOMMENT);

  while(1)
  {
	($string, $tok_type, $line, $err, $errline)	= tokenizer_scan();
	last if($tok == TOK_ERROR || $tok == TOK_EOF);

	if($tok == TOK_TEXT)		{ 	}
	elsif($tok == TOK_BLANK)	{ 	}
	elsif($tok == TOK_DQUOTE)	{ $str	= "\"$str\"";	}
	elsif($tok == TOK_SQUOTE)	{ $str	= "\'$str\'";	}
	elsif($tok == TOK_SIQUOTE)	{ $str	= "\`$str\'";	}
	elsif($tok == TOK_IQUOTE)	{ $str	= "\`$str\`";	}
	elsif($tok == TOK_EOL)		{	}
	elsif($tok == TOK_COMMENT)	{	}
	elsif($tok == TOK_UNDEF)
		{ last;		}
	else	{ last;	};
	print $str;
  }
  tokenizer_delete($tokid);

DESCRIPTION

Text::Tokenizer is very fast lexical analyzer, that can be used to process input text from file or buffer to basic tokens:

  • NORMAL TEXT

  • DOUBLE QUOTED "TEXT"

  • SINGLE QUOTED 'TEXT'

  • INVERSE QUOTED 'TEXT'

  • SINGLE-INVERSE QUOTED `TEXT'

  • WHITESPACE TEXT

  • #COMMENTS

  • END OF LINE

  • END OF FILE

EXPORT

None by default. You have to selectively import methods or constants or use ':all' to import all constants & methods.

CONSTANTS

TOKEN TYPES Token types that tokenizer returns.

TOK_UNDEF

Undefined token (tokenizer error)

TOK_TEXT

Normal_text

TOK_DQUOTE

"Double quoted text"

TOK_SQUOTE

'Single quoted text'

TOK_IQUOTE

`Inverse quoted text`

TOK_SIQUOTE

`Single-inverse quoted text'

TOK_BLANK

Whitespace text

TOK_COMMENT

#Comment

TOK_EOL

End of Line

TOK_EOF

End of File

TOK_ERROR

Error Condition (see ERROR_TYPES)

ERROR TYPES Error codes that will tokenizer return if error happens.

NOERR

No error

UNCLOSED_DQUOTE

Unclosed double quote found

UNCLOSED_SQUOTE

Unclosed single quote found

UNCLOSED_IQUOTE

Unclosed inverse quote found

NOCONTEXT

Failed to allocate tokenizer context (FATAL ERROR)

TOKENIZER OPTIONS Options configurable for tokenizer. They should be OR-ed when passing to tokenizer_options.

TOK_OPT_DEFAULT

Default options set, equals to TOK_OPT_NOUNESCAPE

TOK_OPT_NONE

Set no options. Tokenizer will do in it's default behaviour - it will not unescape anything and it will not pass comments to you.

TOK_OPT_NOUNESCAPE

Disable characters & lines unescaping.

TOK_OPT_SIQUOTE

Enable looking for `single-inverse quote' combination.

TOK_OPT_UNESCAPE

Unescape chars & lines.

TOK_OPT_UNESCAPE_CHARS

Unescape chars

TOK_OPT_UNESCAPE_LINES

Unescape lines

TOK_OPT_PASSCOMMENT

Enable comment passing to user routines.

METHODS

$options = tokenizer_options(OPTIONS)

Set tokenizer options.

$tok_id = tokenizer_new(FILE_HANDLE)

Create new tokenizer instance(context) from FILE_HANDLE identified by $tok_id.

$tok_id = tokenizer_new_strbuf(BUFFER, LENGTH)

Create new tokenizer instance from string BUFFER long LENGTH characters. Return its tokenizer instance id.

@tok = tokenizer_scan()

Scan current tokenizer instance, and return first token found. @tok = ($string, $type, $line, $error, $error_line)

$string - found token string
$type - it's type
$line - current line
$error - equals error code if error occurs
$error_line - line number where error begins (unclosed quote position)
tokenizer_exists(TOK_ID)

Test if tokenizer instance exists.

tokenizer_switch(TOK_ID)

Switch to another tokenizer instance (like when you perform include statment).

tokenizer_delete(TOK_ID)

Delete tokenizer instance (You have to do it exactly on EOF to release connection between file or buffer.

tokenizer_flush(TOK_ID)

Flush tokenizer instance. This function discards the instance buffer's contents, so the next time the scanner attempts to match a token from the buffer, it will have to fill it.

SEE ALSO

This tokenizer is based on code generated by flex - fast lexical analyzer generator (http://lex.sourceforge.net).

AUTHOR

Samuel Behan, <sam(at)frida.fri.utc.sk>

COPYRIGHT AND LICENSE

Copyright 2003-2004 by Samuel Behan

This library is free software; you can redistribute it and/or modify it under the same terms of GNU/GPL v2.

8 POD Errors

The following errors were encountered while parsing the POD:

Around line 166:

You forgot a '=back' before '=head2'

Around line 168:

'=item' outside of any '=over'

Around line 213:

You forgot a '=back' before '=head2'

Around line 215:

'=item' outside of any '=over'

Around line 236:

You forgot a '=back' before '=head2'

Around line 238:

'=item' outside of any '=over'

Around line 298:

=over should be: '=over' or '=over positive_number'

Around line 311:

=back doesn't take any parameters, but you said =back =back