NAME
CodeGen::Cpppp::CParser - C Parser Utility Library
METHODS
tokenize
@tokens= $class->tokenize($string);
@tokens= $class->tokenize(\$string);
@tokens= $class->tokenize(\$string, $max_tokens);
Parse some number of C language tokens from the input string, and update the regex pos()
of the string so that you can resume parsing more tokens later. Since this updates the pos of the string, you can pass it as a reference to make it more clear to readers what is happening.
If $max_tokens
is given, only that many tokens will be returned.
Whitespace is ignored (not returned as a token) except for whitespace contained in a 'directive' token. The body of a directive needs further tokenized.
Each token is an arrayref of the form:
[ $type, $value, $offset, $length, $error=undef ]
$type: 'directive', 'comment', 'string', 'char', 'real', 'integer',
'keyword', 'ident', 'unknown', or any punctuation character
$value: for constants, this is the decoded string or numeric value
for directives and comments, it is the body text
for punctuation, it is a copy of $type
for unknown, it is the exact character that didn't parse
$src_pos: the character offset within the source $string
$src_len: the number of characters occupied in the source $string
$error: if the token is invalid in some way, but still undisputedly that
type of token (e.g. unclosed string or unclosed comment) it will be
returned with a 5th element containing the error message.
For some tokens, you will need to inspect substr($string, $offset, $length)
to get the full details, like the suffixes on integer constants.
Consecutive string tokens are not merged, since the parser needs to handle that step after preprocessor macros are substituted.
AUTHOR
Michael Conrad <mike@nrdvana.net>
VERSION
version 0.004
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.