<<< Exclusive OR: operator_xor helper function | Lambda Home | Integer tree >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Many parser programs rely on syntax trees for internal representation of the parsed input. Discussion in this section covers sample code located in example/lambda/arithmetic_interpreter subdirectory of CTTL distribution. Formal API documentation for the arithmetic expression interpreter is generated by doxygen. Online versions of the sample files are also available:
The interpreter struct, defined in interpreter.h header, performs computation of arithmetic expressions involving floating-point numbers. For example,
me@CTTL3 ~/cttl300 $ ./arithmetic_interpreter.exe "(5.0 + 7.5)" result: 12.5 me@CTTL3 ~/cttl300 $
The interpreter's member function
double non_terminal( size_t id );
has access to the following data structures:
The function examines content of m_tree and resolves it as a linkage to terminal and non-terminal nodes, together formulating a tree. In this sample, physical nodes are bundles of contiguous elements in the m_tree vector. Each node is uniquely identified by its initial element's absolute offset from the beginning of the container. The program uses input variable "id" to keep track of nodes.
A non-terminal node for a binary operation has the following format:
m_tree[ id + 0 ]: opcode: op_add, op_subtract, op_multiply, op_divide m_tree[ id + 1 ]: left-operand-id m_tree[ id + 2 ]: right-operand-id
The opcodes op_add, op_subtract, op_multiply, and op_divide represent binary arithmetic operations performed on the left and right operands.
A non-terminal node for an unary operation has format:
m_tree[ id + 0 ]: opcode: op_negate m_tree[ id + 1 ]: operand-id
The op_negate is an unary operation, requiring one operand.
A terminal node represents the data,
m_tree[ id + 0 ]: opcode: op_terminal m_tree[ id + 1 ]: data idx - - - -> m_vector_doubles[ idx ]
Terminal nodes represent expression operands. Since expression operands are double-precision floating point values, the terminal node depends on indirect storage of its value. Data field named idx stores entry point into the data table interpreter::m_vector_doubles, which translates integer index to the corresponding double-precision value when the actual computation takes place.
For example, the initial evaluation of the arithmetic expression (5.0 + 7.5) yields a series of nodes and data values as follows:
// The data table: m_vector_doubles[ 0 ]: 5.0 m_vector_doubles[ 1 ]: 7.5 // The syntax tree: m_tree[ 0 ]: (node id: 5) // points to the root node of the tree // node id == 1 m_tree[ 1 ]: op_terminal m_tree[ 2 ]: 0 - - - - - - - - - - - -> m_vector_doubles[ 0 ] // node id == 3 m_tree[ 3 ]: op_terminal m_tree[ 4 ]: 1 - - - - - - - - - - - -> m_vector_doubles[ 1 ] // node id == 5 m_tree[ 5 ]: op_add m_tree[ 6 ]: (node id: 1) // left operand m_tree[ 7 ]: (node id: 3) // right operand
The interpreter receives parsed arithmetic expression in a structure that abstracts addition, subtraction, multiplication, division, and negation of the floating point numbers. The type of this structure is called integer tree.
Constructor of the interpreter struct remembers its tree and the value table in interpreter::m_tree and interpreter::m_vector_doubles data members, respectively.
Public function interpreter::non_terminal() is the entry point for the computation. It returns the result of the arithmetic expression. Its input parameter is the id of the root node, which specifies where the computation should begin. The implementation uses recursion:
For each non-terminal node the interpreter::non_terminal() invokes itself;
For every terminal the function simply returns the floating-point value.
How is the integer tree constructed and populated?
The lexer part of the program, lexer.h, is responsible for parsing the input expression grammar and generating the corresponding syntax tree.
The arithmetic expression is parsed by the struct named calc_lexer.
All temporary data is stored in the calc_lexer::m_stack_nodes container, which is simply the stack of integers.
The parser works as an assembly line for the resulting tree. For each successful evaluation of the grammar production rule that matches an arithmetic operation in the input, the program creates an appropriate node in the output syntax tree.
Physical writing of data is done by std::back_insert_iterator adaptor. For example, to add a new terminal node to the tree, the lexer executes a series of steps, accomplished in-line by the following lambda expression:
// lambda expression to create terminal node: *m_tree_inserter++ = const_scalar( op_terminal ), // code of operation *m_tree_inserter++ = ++( scalar( 0 )^vect_doubles^atof^edge_value ), m_stack_nodes = alias::size( scalar( &vect_parse_tree ) ), *m_stack_nodes -= terminal_size
where
++( scalar( 0 )^vect_doubles^atof^edge_value )
is a translator that takes a substring named edge_value, pointing to the numeric value, and converts it to the floating point number. Next, it inserts the number into the data table vect_doubles, returning an index to the corresponding position of the newly inserted value in the data table.
Although each element of the terminal node is already written, its connection in the syntax tree in not yet known. More specifically, the parent of the parsed terminal node is unknown by the time its grammar rule is fully evaluated and processed. Fortunately, finding a parent is not the responsibility of the terminal node's grammar. The parent linkage is deferred: at this time, the parser pushes the tail position of new node into the stack,
m_stack_nodes = alias::size( scalar( &vect_parse_tree ) ),
and then computes the terminal node id by subtracting its size from the tail position:
*m_stack_nodes -= terminal_size
At a later time, when addition operation is evaluated, the lexer already has two terminal nodes inserted into the tree, along with the corresponding node ids on top of the stack:
*m_tree_inserter++ = const_scalar( op_add ), // code of operation m_temp = *m_stack_nodes--, // the right operand is on top of the stack *m_tree_inserter++ = *m_stack_nodes--, // store left operand *m_tree_inserter++ = m_temp, // store right operand m_stack_nodes = alias::size( scalar( &vect_parse_tree ) ), *m_stack_nodes -= non_terminal_size
Similar to terminal node processing, the lexer resolves non-terminal node identifier by subtracting its size from the tail position, and pushes it on the stack. In the end, the last node id becomes the root node of the output syntax tree.
The main driver program, arithmetic_interpreter.cpp, gets user input from the command line and instantiates
Program passes these containers to the constructor of the lexer object.
After processing the input by invoking the lexer::grammar() entry point, the program passes the populated syntax tree and data to the interpreter. The root node is returned by the first element of the vector that represents the syntax tree.
Finally, the program displays the result of the calculation on the screen.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Exclusive OR: operator_xor helper function | Lambda Home | Integer tree >>> |