NOTE
This document is probably no longer current.
PARSING
This document explains the algorithm used by parse.imc
to parse a hunk of tcl. This is a from-scratch implementation based on the tcl man page. This was, btw, a heck of a lot easier when I had perl5's regexps to do things with. =-)
First, in __main
, we read in the input file and shove it in a string. This then gets passed to the __parse
sub. (Or we take stdin, we're not picky)
Footnotes refer to bullet items in the tcl man page.
STATE_MACHINE
There are several states that our parser can be in:
BEGIN_SCOPE
Where we begin, create a lexical scope in which to store variables.
newline/backslash substitution is performed on the string. [8]
BEGIN_COMMAND
We clear out the Array that is holding our command.
BEGIN_WORD
skip any leading whitespace. If a newline or a ; is found, goto END_COMMAND [1]
if the first character of the first word is a #, then it's a comment: ignore all characters until the next newline, and go to BEGIN_COMMAND [9]
If the first character of a word is a double-quote, the word consists of all the characters between the two double quotes. append it to the command Array and goto BEGIN_WORD [4] (escaped \"'s are ignored.) (any non whitespace/ non command separator character in the stream at this point is an error.)
If the first character of a word is a {, the word consists of all the characters between the { and the }. append the word to the command Array and goto begin word. [5] There must be a matched number of unescaped { and } chars. (any non whitespace/ non command separator character in the stream at this point is an error.)
If there are no more characters, goto END_SCOPE
If any other character, then fall through:
MIDDLE_WORD
We're in the middle of getting a word. Any whitespace indicates END_WORD. A ; or \n indicates END_COMMAND.
If a [ (unescaped) is present, then the word extends to at least the next ]. Grab these characters, goto MIDDLE_WORD
If a ${ appears, the word extends to at least the next }. Grab these characters, goto MIDDLE_WORD
END_WORD
We've reached the end of a word. Add it to the array of words. goto BEGIN_WORD
END_COMMAND
We've reached the end of a command, append any outstanding word into the command array.
Append the command array to the array of commands. Goto BEGIN_COMMAND
END_SCOPE
We now have an array of arrays, which correspond to the raw text of the words in the code. Now we need to perform various substitutions on the words. (In a future version, this is where we'd compile the code. For now, we'll just interpret it.)
RUN_COMMAND
pop an array off the array of commands. For each of the words in the command array, we need to make sure we only process each character of text once - to do this, we keep a linked list of { state, start, len } - Each round of substitution can only happen on raw segments. Once a substitution occurs, the list is further segmented, the raw being broken up into possibly multiple alternating raw/cooked segments. Substitutions are NOT done on words that were {} words.
- Command substitution
-
All characters between a [ and ] are considered a tcl script, and run through a separate invocation of the parser.
- Variable substitution
-
If there's a $ , then any of the following text is replaced with the corresponding variable value: $name , $name(index) and ${name}. index has command, variable, and backslash substitutions performed on it before it's used to lookup a value.
- backslash substitution
-
Various \ substitutions, except for backslash-newline, which is done before anything else when we first get our script.
EXECUTE_COMMAND
At this point, each of the words is as cooked as it's going to be. Put the list for each word back together into a single string. Call the command associated with the first cooked word and pass in the rest of the array as the parameters.
Save return value. (but only the last one)
While there are commands left, go to RUN_COMMAND
return the last return value saved. (XXX: what to return if there was no command executed? empty string?)
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 111:
You forgot a '=back' before '=head2'