NAME

CodeGen::Cpppp::CParser - C Parser Utility Library

METHODS

tokenize

@tokens= $class->tokenize($string);
@tokens= $class->tokenize(\$string);
@tokens= $class->tokenize(\$string, $max_tokens);

Parse some number of C language tokens from the input string, and update the regex pos() of the string so that you can resume parsing more tokens later. Since this updates the pos of the string, you can pass it as a reference to make it more clear to readers what is happening.

If $max_tokens is given, only that many tokens will be returned.

Whitespace is ignored (not returned as a token) except for whitespace contained in a 'directive' token. The body of a directive needs further tokenized.

Each token is an arrayref of the form:

[ $type, $value, $offset, $length, $error=undef ]

$type:   'directive', 'comment', 'string', 'char', 'real', 'integer',
         'keyword', 'ident', 'unknown', or any punctuation character

$value:  for constants, this is the decoded string or numeric value
         for directives and comments, it is the body text
         for punctuation, it is a copy of $type
         for unknown, it is the exact character that didn't parse

$src_pos: the character offset within the source $string

$src_len: the number of characters occupied in the source $string

$error: if the token is invalid in some way, but still undisputedly that
        type of token (e.g. unclosed string or unclosed comment) it will be
        returned with a 5th element containing the error message.

For some tokens, you will need to inspect substr($string, $offset, $length) to get the full details, like the suffixes on integer constants.

Consecutive string tokens are not merged, since the parser needs to handle that step after preprocessor macros are substituted.

AUTHOR

Michael Conrad <mike@nrdvana.net>

VERSION

version 0.005

COPYRIGHT AND LICENSE

This software is copyright (c) 2024 by Michael Conrad.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.