Name
Unisyn::Parse - Parse a Unisyn expression.
Synopsis
Parse the Unisyn expression:
my $expr = "𝗮𝑎𝑠𝑠𝑖𝑔𝑛⌊〈❨𝗯𝗽❩〉𝐩𝐥𝐮𝐬❪𝘀𝗰❫⌋⟢𝗮𝗮𝑎𝑠𝑠𝑖𝑔𝑛❬𝗯𝗯𝐩𝐥𝐮𝐬𝗰𝗰❭⟢";
using:
create (K(address, Rutf8 $expr))->print;
to get:
ok Assemble(debug => 0, eq => <<END);
Semicolon
Term
Assign: 𝑎𝑠𝑠𝑖𝑔𝑛
Term
Variable: 𝗮
Term
Brackets: ⌊⌋
Term
Term
Dyad: 𝐩𝐥𝐮𝐬
Term
Brackets: ❨❩
Term
Term
Brackets: ❬❭
Term
Term
Variable: 𝗯𝗽
Term
Brackets: ❰❱
Term
Term
Variable: 𝘀𝗰
Term
Assign: 𝑎𝑠𝑠𝑖𝑔𝑛
Term
Variable: 𝗮𝗮
Term
Brackets: ❴❵
Term
Term
Dyad: 𝐩𝐥𝐮𝐬
Term
Variable: 𝗯𝗯
Term
Variable: 𝗰𝗰
END
Description
Parse a Unisyn expression.
Version "20210915".
The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.
Create
Create a Unisyn parse of a utf8 string.
create($address, %options)
Create a new unisyn parse from a utf8 string.
Parameter Description
1 $address Address of a zero terminated utf8 source string to parse as a variable
2 %options Parse options.
Example:
create (K(address, Rutf8 $Lex->{sampleText}{vav}))->print; # Create parse tree from source terminated with zero # 𝗘𝘅𝗮𝗺𝗽𝗹𝗲
ok Assemble(debug => 0, eq => <<END);
Assign: 𝑎
Term
Variable: 𝗮
Term
Variable: 𝗯
END
Parse
Parse Unisyn expressions
Traverse
Traverse the parse tree
traverseTermsAndCall($parse)
Traverse the terms in parse tree in post order and call the operator subroutine associated with each term.
Parameter Description
1 $parse Parse tree
Example:
my $p = create (K(address, Rutf8 $Lex->{sampleText}{A}), operators => sub
{my ($parse) = @_;
my $assign = Subroutine
{PrintOutStringNL "call assign";
} [], name=>"UnisynParse::assign";
my $equals = Subroutine
{PrintOutStringNL "call equals";
} [], name=>"UnisynParse::equals";
my $o = $parse->operators; # Operator subroutines
$o->assign(asciiToAssignLatin("assign"), $assign);
$o->assign(asciiToAssignLatin("equals"), $equals);
});
$p->traverseTermsAndCall; # 𝗘𝘅𝗮𝗺𝗽𝗹𝗲
Assemble(debug => 0, eq => <<END)
call equals
END
Print a parse tree
print($parse)
Print a parse tree.
Parameter Description
1 $parse Parse tree
Example:
create (K(address, Rutf8 $Lex->{sampleText}{vav}))->print; # Create parse tree from source terminated with zero # 𝗘𝘅𝗮𝗺𝗽𝗹𝗲
ok Assemble(debug => 0, eq => <<END);
Assign: 𝑎
Term
Variable: 𝗮
Term
Variable: 𝗯
END
SubQuark
A set of quarks describing the method to be called for each lexical operator. These routines specialize the general purpose quark methods for use on parse methods.
Nasm::X86::Arena::DescribeSubQuarks($arena)
Return a descriptor for a subQuarks in the specified arena.
Parameter Description
1 $arena Arena descriptor
Nasm::X86::Arena::CreateSubQuarks($arena)
Create quarks in a specified arena.
Parameter Description
1 $arena Arena description optional arena address
Unisyn::Parse::SubQuarks::reload($q, %options)
Reload the description of a set of sub quarks.
Parameter Description
1 $q Subquarks
2 %options {arena=>arena to use; tree => first tree block; array => first array block}
Unisyn::Parse::SubQuarks::put($q, $string, $sub)
Put a new subroutine definition into the sub quarks.
Parameter Description
1 $q Subquarks
2 $string String containing operator type and method name
3 $sub Variable offset to subroutine
Unisyn::Parse::SubQuarks::subFromQuark($q, $lexicals, $number)
Given the quark number for a lexical item and the quark set of lexical items get the offset of the associated method.
Parameter Description
1 $q Sub quarks
2 $lexicals Lexical item quarks
3 $number Lexical item quark
Unisyn::Parse::SubQuarks::lexToString($q, $alphabet, $op)
Convert a lexical item to a string.
Parameter Description
1 $q Sub quarks
2 $alphabet The alphabet number
3 $op The operator name in that alphabet
Unisyn::Parse::SubQuarks::dyad($q, $text, $sub)
Define a method for a dyadic operator.
Parameter Description
1 $q Sub quarks
2 $text Sub quarks
3 $sub The name of the operator as a utf8 string
Unisyn::Parse::SubQuarks::assign($q, $text, $sub)
Define a method for an assign operator.
Parameter Description
1 $q Sub quarks
2 $text The name of the operator as a utf8 string
3 $sub Variable associated subroutine offset
assignToShortString($short, $text)
Create a short string representing a dyad and put it in the specified short string.
Parameter Description
1 $short The number of the short string
2 $text The text of the operator in the assign alphabet
Alphabets
Translate between alphabets
asciiToAssignLatin($in)
Translate ascii to the corresponding letters in the assign latin alphabet.
Parameter Description
1 $in A string of ascii
asciiToAssignGreek($in)
Translate ascii to the corresponding letters in the assign greek alphabet.
Parameter Description
1 $in A string of ascii
asciiToDyadLatin($in)
Translate ascii to the corresponding letters in the dyad latin alphabet.
Parameter Description
1 $in A string of ascii
asciiToDyadGreek($in)
Translate ascii to the corresponding letters in the dyad greek alphabet.
Parameter Description
1 $in A string of ascii
asciiToPrefixLatin($in)
Translate ascii to the corresponding letters in the prefix latin alphabet.
Parameter Description
1 $in A string of ascii
asciiToPrefixGreek($in)
Translate ascii to the corresponding letters in the prefix greek alphabet.
Parameter Description
1 $in A string of ascii
asciiToSuffixLatin($in)
Translate ascii to the corresponding letters in the suffix latin alphabet.
Parameter Description
1 $in A string of ascii
asciiToSuffixGreek($in)
Translate ascii to the corresponding letters in the suffix greek alphabet.
Parameter Description
1 $in A string of ascii
asciiToVariableLatin($in)
Translate ascii to the corresponding letters in the suffix latin alphabet.
Parameter Description
1 $in A string of ascii
asciiToVariableGreek($in)
Translate ascii to the corresponding letters in the suffix greek alphabet.
Parameter Description
1 $in A string of ascii
asciiToEscaped($in)
Translate ascii to the corresponding letters in the escaped ascii alphabet.
Parameter Description
1 $in A string of ascii
semiColon()
Translate ascii to the corresponding letters in the escaped ascii alphabet.
Hash Definitions
Unisyn::Parse Definition
Sub quarks
Output fields
address8
Address of source string as utf8
arena
Arena containing tree
fails
Number of failures encountered in this parse
operators
Methods implementing each lexical operator
parse
Offset to the head of the parse tree
quarks
Quarks representing the strings used in this parse
size8
Size of source string as utf8
source32
Source text as utf32
sourceLength32
Length of utf32 string
sourceSize32
Size of utf32 allocation
subQuarks
The quarks used to map a subroutine name to an offset
Private Methods
getAlpha($register, $address, $index)
Load the position of a lexical item in its alphabet from the current character.
Parameter Description
1 $register Register to load
2 $address Address of start of string
3 $index Index into string
getLexicalCode($register, $address, $index)
Load the lexical code of the current character in memory into the specified register.
Parameter Description
1 $register Register to load
2 $address Address of start of string
3 $index Index into string
putLexicalCode($register, $address, $index, $code)
Put the specified lexical code into the current character in memory.
Parameter Description
1 $register Register used to load code
2 $address Address of string
3 $index Index into string
4 $code Code to put
loadCurrentChar()
Load the details of the character currently being processed so that we have the index of the character in the upper half of the current character and the lexical type of the character in the lowest byte.
checkStackHas($depth)
Check that we have at least the specified number of elements on the stack.
Parameter Description
1 $depth Number of elements required on the stack
pushElement()
Push the current element on to the stack.
pushEmpty()
Push the empty element on to the stack.
lexicalNameFromLetter($l)
Lexical name for a lexical item described by its letter.
Parameter Description
1 $l Letter of the lexical item
lexicalNumberFromLetter($l)
Lexical number for a lexical item described by its letter.
Parameter Description
1 $l Letter of the lexical item
lexicalItemLength($source32, $offset)
Put the length of a lexical item into variable size.
Parameter Description
1 $source32 B<address> of utf32 source representation
2 $offset B<offset> to lexical item in utf32
new($depth, $description)
Create a new term in the parse tree rooted on the stack.
Parameter Description
1 $depth Stack depth to be converted
2 $description Text reason why we are creating a new term
error($message)
Write an error message and stop.
Parameter Description
1 $message Error message
testSet($set, $register)
Test a set of items, setting the Zero Flag is one matches else clear the Zero flag.
Parameter Description
1 $set Set of lexical letters
2 $register Register to test
checkSet($set)
Check that one of a set of items is on the top of the stack or complain if it is not.
Parameter Description
1 $set Set of lexical letters
reduce($priority)
Convert the longest possible expression on top of the stack into a term at the specified priority.
Parameter Description
1 $priority Priority of the operators to reduce
reduceMultiple($priority)
Reduce existing operators on the stack.
Parameter Description
1 $priority Priority of the operators to reduce
accept_a()
Assign.
accept_b()
Open.
accept_B()
Closing parenthesis.
accept_d()
Infix but not assign or semi-colon.
accept_p()
Prefix.
accept_q()
Post fix.
accept_s()
Semi colon.
accept_v()
Variable.
parseExpression()
Parse the string of classified lexical items addressed by register $start of length $length. The resulting parse tree (if any) is returned in r15.
MatchBrackets(@parameters)
Replace the low three bytes of a utf32 bracket character with 24 bits of offset to the matching opening or closing bracket. Opening brackets have even codes from 0x10 to 0x4e while the corresponding closing bracket has a code one higher.
Parameter Description
1 @parameters Parameters
ClassifyNewLines(@parameters)
Scan input string looking for opportunities to convert new lines into semi colons.
Parameter Description
1 @parameters Parameters
ClassifyWhiteSpace(@parameters)
Classify white space per: "lib/Unisyn/whiteSpace/whiteSpaceClassification.pl".
Parameter Description
1 @parameters Parameters
parseUtf8($parse, @parameters)
Parse a unisyn expression encoded as utf8 and return the parse tree.
Parameter Description
1 $parse Parse
2 @parameters Parameters
printLexicalItem($parse, $source32, $offset, $size)
Print the utf8 string corresponding to a lexical item at a variable offset.
Parameter Description
1 $parse Parse tree
2 $source32 B<address> of utf32 source representation
3 $offset B<offset> to lexical item in utf32
4 $size B<size> in utf32 chars of item
showAlphabet($alphabet)
Show an alphabet.
Parameter Description
1 $alphabet Alphabet name
T($key, $expected, %options)
Parse some text and dump the results.
Parameter Description
1 $key Key of text to be parsed
2 $expected Expected result
3 %options Options
C($key, $expected, %options)
Parse some text and print the results.
Parameter Description
1 $key Key of text to be parsed
2 $expected Expected result
3 %options Options
Index
1 accept_a - Assign.
2 accept_B - Closing parenthesis.
3 accept_b - Open.
4 accept_d - Infix but not assign or semi-colon.
5 accept_p - Prefix.
6 accept_q - Post fix.
7 accept_s - Semi colon.
8 accept_v - Variable.
9 asciiToAssignGreek - Translate ascii to the corresponding letters in the assign greek alphabet.
10 asciiToAssignLatin - Translate ascii to the corresponding letters in the assign latin alphabet.
11 asciiToDyadGreek - Translate ascii to the corresponding letters in the dyad greek alphabet.
12 asciiToDyadLatin - Translate ascii to the corresponding letters in the dyad latin alphabet.
13 asciiToEscaped - Translate ascii to the corresponding letters in the escaped ascii alphabet.
14 asciiToPrefixGreek - Translate ascii to the corresponding letters in the prefix greek alphabet.
15 asciiToPrefixLatin - Translate ascii to the corresponding letters in the prefix latin alphabet.
16 asciiToSuffixGreek - Translate ascii to the corresponding letters in the suffix greek alphabet.
17 asciiToSuffixLatin - Translate ascii to the corresponding letters in the suffix latin alphabet.
18 asciiToVariableGreek - Translate ascii to the corresponding letters in the suffix greek alphabet.
19 asciiToVariableLatin - Translate ascii to the corresponding letters in the suffix latin alphabet.
20 assignToShortString - Create a short string representing a dyad and put it in the specified short string.
21 C - Parse some text and print the results.
22 checkSet - Check that one of a set of items is on the top of the stack or complain if it is not.
23 checkStackHas - Check that we have at least the specified number of elements on the stack.
24 ClassifyNewLines - Scan input string looking for opportunities to convert new lines into semi colons.
25 ClassifyWhiteSpace - Classify white space per: "lib/Unisyn/whiteSpace/whiteSpaceClassification.
26 create - Create a new unisyn parse from a utf8 string.
27 error - Write an error message and stop.
28 getAlpha - Load the position of a lexical item in its alphabet from the current character.
29 getLexicalCode - Load the lexical code of the current character in memory into the specified register.
30 lexicalItemLength - Put the length of a lexical item into variable size.
31 lexicalNameFromLetter - Lexical name for a lexical item described by its letter.
32 lexicalNumberFromLetter - Lexical number for a lexical item described by its letter.
33 loadCurrentChar - Load the details of the character currently being processed so that we have the index of the character in the upper half of the current character and the lexical type of the character in the lowest byte.
34 MatchBrackets - Replace the low three bytes of a utf32 bracket character with 24 bits of offset to the matching opening or closing bracket.
35 Nasm::X86::Arena::CreateSubQuarks - Create quarks in a specified arena.
36 Nasm::X86::Arena::DescribeSubQuarks - Return a descriptor for a subQuarks in the specified arena.
37 new - Create a new term in the parse tree rooted on the stack.
38 parseExpression - Parse the string of classified lexical items addressed by register $start of length $length.
39 parseUtf8 - Parse a unisyn expression encoded as utf8 and return the parse tree.
40 print - Print a parse tree.
41 printLexicalItem - Print the utf8 string corresponding to a lexical item at a variable offset.
42 pushElement - Push the current element on to the stack.
43 pushEmpty - Push the empty element on to the stack.
44 putLexicalCode - Put the specified lexical code into the current character in memory.
45 reduce - Convert the longest possible expression on top of the stack into a term at the specified priority.
46 reduceMultiple - Reduce existing operators on the stack.
47 semiColon - Translate ascii to the corresponding letters in the escaped ascii alphabet.
48 showAlphabet - Show an alphabet.
49 T - Parse some text and dump the results.
50 testSet - Test a set of items, setting the Zero Flag is one matches else clear the Zero flag.
51 traverseTermsAndCall - Traverse the terms in parse tree in post order and call the operator subroutine associated with each term.
52 Unisyn::Parse::SubQuarks::assign - Define a method for an assign operator.
53 Unisyn::Parse::SubQuarks::dyad - Define a method for a dyadic operator.
54 Unisyn::Parse::SubQuarks::lexToString - Convert a lexical item to a string.
55 Unisyn::Parse::SubQuarks::put - Put a new subroutine definition into the sub quarks.
56 Unisyn::Parse::SubQuarks::reload - Reload the description of a set of sub quarks.
57 Unisyn::Parse::SubQuarks::subFromQuark - Given the quark number for a lexical item and the quark set of lexical items get the offset of the associated method.
Installation
This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:
sudo cpan install Unisyn::Parse
Author
Copyright
Copyright (c) 2016-2021 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 3300:
=pod directives shouldn't be over one line long! Ignoring all 5 lines of content