CTTL on
SourceForge |
Download
Latest |
Documentation
Index |
Library
News |
CVS
Repository |
Other
Links |
CTTL implementation takes advantage of assert() macro from <cassert> header, providing adequate amount of assertions that help catching most obvious bugs. To disable assertions from being compiled, define NDEBUG macro before including any of the library headers:
#define NDEBUG #include "cttl/cttl.h"
Assertions fall into the category of run time diagnostics. In addition to assertions, the program could print a fair amount of run time debugging information, commonly known as trace. Traces provide significant assistance in understanding code and alleviating hard issues. Result of the trace (usually text messages sent to the standard output) gives a "big picture" with enough details to begin working on problems, find ways to optimize code, and simply verify that the code works as expected.
Tracing helps to debug grammars without conventional debuggers. CTTL traces are built into library implementation classes; therefore, a program can turn on traces simply by defining one of the trace-level macros. CTTL trace messages are sent to the standard output. Traces follow the trail of grammar evaluations with support for different levels of intensity. The messages are globally serialized for all threads. Current implementation of the tracing facility has no special support for multi-threaded environments. Consequently, at present time, all tracing activities should be limited to grammar debugging on a single thread.
Trace generation naturally slows down the speed of execution, but that kind of influence could be reduced if smaller inputs were submitted to the parser. Different issue arises when program creates large amounts of trace messages. For example, programming language parser could easily hammer down hundreds of megabytes of traces. When standard output is redirected, an operating system does a good job writing the information into the text file, usually much faster than printing same output to the console screen. In this way, creating a trace file is not a big challenge. On the other hand, reading and searching through the data could become a burden. Our recommendation is to split the information into roughly equivalent, but better manageable chunks of text. CTTL samples library includes the file splitter utility, named partition_file.cpp, which divides large text files into fractions of 100MB-sized text. This approach is often particularly productive, because pertinent information usually appears close to the end of the trace output.
Programmer may choose an appropriate level of tracing by defining one of the following macros:
Table D.1 - Trace level macrosTrace macro | description |
---|---|
CTTL_TRACE_TRIVIAL |
Turns on messages generated by
CTTL_TRACE_MESSAGE() macro.
|
CTTL_TRACE_RULES |
Turns on rule()-level tracing, which includes traces
produced by CTTL_RULE(), CTTL_MEMBER_RULE(),
CTTL_STATIC_RULE(), as well as CTTL_TRACE_MESSAGE().
|
CTTL_TRACE_EVERYTHING |
Turns on grammar expression tracing, in addition to the rule()-level
traces. This tracing mode generates most detailed level of
run time trace information, which includes rules, and both
terminal, and non-terminal symbol data.
|
CTTL_TRACE_SILENT |
Brute force silencer of the standard output, effective in case of a need to
silence an enormous flood of trace messages. For example, macro
CTTL_TRACE_SILENT( true ) turns all standard output to std::cout
off and is equivalent to
std::cout.flush(); std::cout.setstate( std::ios_base::badbit ); // silence // ...Macro CTTL_TRACE_SILENT( false ) turns standard output back on and is equivalent to std::cout.clear(); // output back to vocal // ...CTTL_TRACE_SILENT( true ) creates a temporary object on the stack, therefore, the silencer is effective only in the scope of the function. The destructor of the temporary object turns the output back on automatically: void f() { CTTL_TRACE_SILENT( true ); // silence inside f() // ... } void g() { f(); // output is back to vocal as g() continues // ... }The opposite is also true - CTTL_TRACE_SILENT( false ) also creates a temporary object on the stack to enable output in the current scope of the function. Object destructor automatically turns the output off: void f() { CTTL_TRACE_SILENT( false ); // output becomes vocal inside f() // ... } void g() { CTTL_TRACE_SILENT( true ); // silence inside g() f(); // silence inside g() continues... } |
In addition to built-in traces, rule()-level symbolic information can be added to the trace output by a set of trace macros:
Table D.2 - Grammar trace macros
CTTL_TRACE_MESSAGE( text_message )where text_message is a null-terminated string. |
Sends user text_message to
the standard output.
|
CTTL_RULE( r )where r is non-static member function representing a grammar rule. |
CTTL_RULE macro falls into the category of member
function adaptors.
The argument is the address of
a non-static member function that represents a grammar rule.
The CTTL_RULE macro is similar to rule(*this,&C::r) adaptor, but requires only address of the class member function. When this macro is used, the member function context is assumed, so that the this pointer is a valid expression. Depending on the level of tracing, CTTL_RULE produces a trace that shows events of getting in and out of the encapsulated rule r. See arithmetics_traced.cpp sample program for an example of use. |
CTTL_MEMBER_RULE( object, &r )where r is non-static member function representing a grammar rule, and object is a reference to an instance of C++ class that defines r. |
CTTL_MEMBER_RULE is similar to CTTL_RULE,
but requires both object reference and class member function,
similar to
rule(object,&C::r) member function adaptor. |
CTTL_STATIC_RULE( r )where r is global or static member function representing a rule. |
The macro can be used in place of
rule(r) function adaptor. |
A modified version of the arithmetic expression parser, named arithmetics_traced.cpp, demonstrates CTTL tracing facility. The traced version of the program uses debug macros to invoke grammar rules.
To begin, we compile the program without tracing levels defined. If input expression is "2*-3", the program generates the following output:
Input: 2*-3 Output: 1 ; 2 PUSH 2 3 ; 4 PUSH 3 5 ; 6 POP R1 7 NEG R1 ; R1 = -(3) 8 PUSH R1 9 ; 10 POP R1 11 POP R2 12 MUL R2, R1 ; R2 = R2 *-3 13 PUSH R2
If incorrect arithmetic expression is entered, "2-*3", the output becomes
Input: 2-*3 Output: 1 ; 2 PUSH 2 *** syntax error *** 2-*3 ^-- at position 2 *** error: parser terminated: *** 2-*3 ^-- at position 1
To deal with the "unexpected" problem, we define CTTL_TRACE_RULES macro, which turns on the rule-level traces. Information is captured at run time by CTTL_RULE() and CTTL_MEMBER_RULE() macros:
Input: 2-*3 Output: --------------------@2-*3-198->parser::rule_expr@0-4 --------------------@2-*3--85->parser::rule_term@0-4 --------------------@2-*3---60->parser::rule_factor@0-4 -----------------------@2----36->parser::numeric_literal@0-1 1 ; 2 PUSH 2 -----------------------@2++++36<-parser::numeric_literal@0-0 '' ---------------------@-*3+++60<-parser::rule_factor@0-1 '2' ---------------------@-*3++85<-parser::rule_term@0-1 '2' ----------------------@*3--95->parser::rule_term@2-4 ----------------------@*3---60->parser::rule_factor@2-4 ----------------------@*3----50->parser::parse_error@2-4 *** syntax error *** 2-*3 ^-- at position 2 ~~~~~~~~~~~~~~~~~~~~~~@*3~~~~50<-parser::parse_error@2 ~~~~~~~~~~~~~~~~~~~~~~@*3~~~60<-parser::rule_factor@2 ~~~~~~~~~~~~~~~~~~~~~~@*3~~95<-parser::rule_term@2 ---------------------@-*3+198<-parser::rule_expr@0-1 '2' *** error: parser terminated: *** 2-*3 ^-- at position 1
Traces generated by CTTL_RULE(), CTTL_MEMBER_RULE(), and CTTL_STATIC_RULE() have the following format:
--------------------@2-*3-198->parser::rule_expr@0-4 ^^ ^ ^ ^ ^ ^^^ ^ || | | | | ||| | || | | | | ||`-`--Offset range corresponds || | | | | || to the upper and lower bounds || | | | | || of the universe. || | | | | || || | | | | |`---At sign '@' separates rule name || | | | | | from the universe offset range. || | | | | | || | | | `---------------`---Name of the rule captured by || | | | CTTL_RULE(), CTTL_MEMBER_RULE(), || | | | and CTTL_STATIC_RULE() macros. || | | | || | | `----Arrow indicates that the rule is being invoked. || | | || | `------The line number in the C++ source file where the rule || | is being invoked. || | |`--`-------Fragment of the user input text to be parsed. | `----------At sign '@' in front of the input text fragment.When evaluation of the rule completes, the trace shows
---------------------@-*3+198<-parser::rule_expr@0-1 '2' ^ ^ ^ ^ | | | | | | | '--Single quoted literal | | | corresponds to the | | | matched fragment. | | | | | `------Arrow indicates return from the rule | | back to line 198. | | | `------The plus sign indicates successful result of evaluation. | `------At sign '@' in front of the input fragment corresponds to the upper boundary of the universe after evaluation.
CTTL_TRACE_EVERYTHING macro turns on traces of expression evaluation inside individual grammar rules, for example,
Input: 2-*3 Output: --------------------@2-*3-198->parser::rule_expr@0-4 --------------------@2-*3? {; 0-4 --------------------@2-*3---85->parser::rule_term@0-4 --------------------@2-*3? {; 0-4 --------------------@2-*3-----60->parser::rule_factor@0-4 --------------------@2-*3? {| 0-4 --------------------@2-*3? {| 0-4 --------------------@2-*3? {| 0-4 --------------------@2-*3? {| 0-4 --------------------@2-*3? {& 0-4 -----------------------2@| $ 1-4 -----------------------@2-----------36->parser::numeric_literal@0-1 1 ; 2 PUSH 2 -----------------------@2+++++++++++36<-parser::numeric_literal@0-0 '' } } } } } ---------------------@-*3+++++60<-parser::rule_factor@0-1 '2' ---------------------@-*3? {* 1-4 ---------------------@-*3? {| 1-4 ---------------------@-*3? {& 1-4 ---------------------@-*3? {; 1-4 ~~~~~~~~~~~~~~~~~~~~~~~2@~ * 1-4 FAIL char } ~~~~~~~~~~~~~~~~~~~~~~~2@~ & 1-4 FAIL } ---------------------@-*3? {& 1-4 ---------------------@-*3? {; 1-4 ~~~~~~~~~~~~~~~~~~~~~~~2@~ / 1-4 FAIL char } ~~~~~~~~~~~~~~~~~~~~~~~2@~ & 1-4 FAIL } ~~~~~~~~~~~~~~~~~~~~~~~2@~ | 1-4 FAIL } } } ---------------------@-*3+++85<-parser::rule_term@0-1 '2' ---------------------@-*3? {* 1-4 ---------------------@-*3? {| 1-4 ---------------------@-*3? {& 1-4 ---------------------@-*3? {; 1-4 ~~~~~~~~~~~~~~~~~~~~~~~2@~ + 1-4 FAIL char } ~~~~~~~~~~~~~~~~~~~~~~~2@~ & 1-4 FAIL } ---------------------@-*3? {& 1-4 ---------------------@-*3? {; 1-4 ----------------------2-@| - 2-4 char ----------------------@*3-------95->parser::rule_term@2-4 ----------------------@*3? {; 2-4 ----------------------@*3---------60->parser::rule_factor@2-4 ----------------------@*3? {| 2-4 ----------------------@*3? {| 2-4 ----------------------@*3? {| 2-4 ----------------------@*3? {| 2-4 ----------------------@*3? {& 2-4 ~~~~~~~~~~~~~~~~~~~~~~2-@~ $ 2-4 FAIL ~~~~~~~~~~~~~~~~~~~~~~2-@~ & 2-4 FAIL } ----------------------@*3? {; 2-4 ----------------------@*3? {; 2-4 ~~~~~~~~~~~~~~~~~~~~~~2-@~ ( 2-4 FAIL char } } ~~~~~~~~~~~~~~~~~~~~~~2-@~ | 2-4 FAIL } ----------------------@*3? {; 2-4 ~~~~~~~~~~~~~~~~~~~~~~2-@~ - 2-4 FAIL char } ~~~~~~~~~~~~~~~~~~~~~~2-@~ | 2-4 FAIL } ----------------------@*3? {; 2-4 ~~~~~~~~~~~~~~~~~~~~~~2-@~ + 2-4 FAIL char } ~~~~~~~~~~~~~~~~~~~~~~2-@~ | 2-4 FAIL } ----------------------@*3-----------50->parser::parse_error@2-4 *** syntax error *** 2-*3 ^-- at position 2 ~~~~~~~~~~~~~~~~~~~~~~@*3~~~~~~~~~~~50<-parser::parse_error@2 ~~~~~~~~~~~~~~~~~~~~~~2-@~ | 2-4 FAIL } ~~~~~~~~~~~~~~~~~~~~~~@*3~~~~~~~~~60<-parser::rule_factor@2 } ~~~~~~~~~~~~~~~~~~~~~~@*3~~~~~~~95<-parser::rule_term@2 } ~~~~~~~~~~~~~~~~~~~~~~~2@~ & 1-4 FAIL } ~~~~~~~~~~~~~~~~~~~~~~~2@~ | 1-4 FAIL } } } ---------------------@-*3+198<-parser::rule_expr@0-1 '2' *** error: parser terminated: *** 2-*3 ^-- at position 1
Individual trace lines contain trace symbols corresponding to terminal and non-terminal units of grammar. For example, the following lines show trace of evaluation of two characters. Character 'a' matched at offset 1, but evaluation of 'b' has failed:
-----------------------a@| a 1-15 char ~~~~~~~~~~~~~~~~~~~~~~~a@~ b 1-15 FAIL charTraces of non-terminal symbols appear enclosed by pair of braces. For example, when expression
symbol('a') + symbol('b')is evaluated against input text "abc", the trace output may look like
---------------------@abc?{; 0-3 -----------------------a@| a 1-3 char ----------------------ab@| b 2-3 char }To distinguish between sequences and Kleene plus operators, sequence operator, '+', is replaced by a semicolon, ';'. The question mark in front of the sequence, '?', indicates that result of the evaluation is yet to be known. Pipe characters, '|', separate fragments of input text from the symbol evaluation data.
The lines formed by tilde characters, '~', indicate failed symbols. In the following example, grammar expression
symbol('a') ^ symbol('c')fails against input text "abc":
---------------------@abc?{^ 0-3 -----------------------a@| a 1-3 char ~~~~~~~~~~~~~~~~~~~~~~~a@~ c 1-3 FAIL char ~~~~~~~~~~~~~~~~~~~~~~~~@~ ^ 0-3 FAIL }Lexeme symbol('a') successfully matches character 'a', but symbol('c') fails to match 'b'. The next line of trace shows that concatenation operator '^', connecting these two symbols, also fails.
The following table specifies various CTTL trace symbols:
Table D.3 - CTTL trace symbols
Trace symbol | description |
---|---|
{e |
Represents
edge expression adaptor
construct.
|
eN |
where N is non-negative integer corresponding to
the node identity that represents upper boundary
of the edge.
Displays result of the
edge expression adaptor
evaluation.
|
{n |
Represents
node expression adaptor
construct.
|
nN |
where N is a non-negative integer
corresponding to the node identity.
Displays result of the
node expression adaptor
evaluation.
|
i |
Kleene
list iteration messages, one of the following:
"kleene_list: iteration made no progress: bailing out" "kleene_list: epsilon match: bailing out" "kleene_list: user-specified match limit: bailing out" |
{! |
Unary search operator.
|
{!! |
Identifies repeatable search operator.
|
{* |
Identifies Kleene star operator.
|
{+ |
Identifies Kleene plus operator.
|
{E |
Uppercase 'E' represents
entity expression adaptor,
written as entity(R),
which enforces non-empty match of
the underlying expression R
|
{; |
Semicolon indicates relaxed
sequence
operator, R1 + R2. The semicolon
chosen to distinguish sequence from
Kleene plus
operators.
|
{| |
Pipe symbols represent
set union
and
POSIX set union
operators.
|
{& |
Specifies
set intersection
operator.
|
{" |
Identifies various
quote
symbols.
|
L |
Identifies diagnostic messages generated by
lexeme
terminal symbols, one of the following:
"empty universe" "lexeme match out-of-universe bounds" "lexeme match inside void space" "lexeme find: invalid universe" "lexeme find: out-of-universe bounds" "lexeme find: lower bound inside void region" "lexeme find: upper bound inside void region" |
0 |
Digit '0' represents
symbol(false)
and
begin(false)
lexemes.
|
1 |
Digit '1' represents
symbol(true)
and
begin(true)
lexemes.
|
S |
Uppercase 'S' represents
string set
lexeme.
|
< |
Indicates upper boundary of the character entity, specified by
begin(char),
begin(is...), and
begin(text)
lexemes.
|
> |
Indicates lower boundary of the character entity, specified by
end(char),
end(is...), and
end(text)
lexemes.
|
F |
Indicates first character of the character entity, specified by
first(is...)
and
first(text)
lexemes.
|
$ |
Indicates character entity corresponding to
entity(is...)
and
entity(text)
lexemes.
|
T |
Identifies
symbol(text)
lexeme,
specifying user-defined character entity.
|
A |
begin() lexeme,
corresponding to the beginning of the user input.
|
Z |
Identifies
end()
lexeme, which corresponds to the lower boundary of the universe.
|
R |
Represents grammar rule invoked by one of the
function adaptors.
Note that traces of grammar rule calls can be expanded by CTTL_RULE(),
CTTL_MEMBER_RULE(), and CTTL_STATIC_RULE() macros.
|
{u |
Trace produced by unary operator of lambda expression.
|
{b |
Trace produced by binary operator of lambda expression.
|
{d |
Trace produced by lambda compound set of deposit instructions.
|
After returning back to the arithmetic expression parser example, and realizing that the problem was data related, we can change the data string back to the proper input format, "2*-3", and rerun the program. To reduce the volume of output, we can also scale the trace back to the CTTL_TRACE_RULES level (compare to full version):
Input: 2*-3 Output: --------------------@2*-3-198->parser::rule_expr@0-4 --------------------@2*-3--85->parser::rule_term@0-4 --------------------@2*-3---60->parser::rule_factor@0-4 -----------------------@2----36->parser::numeric_literal@0-1 1 ; 2 PUSH 2 -----------------------@2++++36<-parser::numeric_literal@0-0 '' ---------------------@*-3+++60<-parser::rule_factor@0-1 '2' ----------------------@-3---64->parser::rule_factor@2-4 -----------------------@3----43->parser::rule_factor@3-4 -----------------------@3-----36->parser::numeric_literal@3-4 3 ; 4 PUSH 3 -----------------------@3+++++36<-parser::numeric_literal@3-3 '' ------------------------@++++43<-parser::rule_factor@3-4 '3' -----------------------@3----43->parser::unary_minus@3-4 5 ; 6 POP R1 7 NEG R1 ; R1 = -(3) 8 PUSH R1 -----------------------@3++++43<-parser::unary_minus@3-3 '' ------------------------@+++64<-parser::rule_factor@2-4 '-3' ---------------------@*-3---66->parser::multiply@1-4 9 ; 10 POP R1 11 POP R2 12 MUL R2, R1 ; R2 = R2 *-3 13 PUSH R2 ---------------------@*-3+++66<-parser::multiply@1-1 '' ------------------------@++85<-parser::rule_term@0-4 '2*-3' ------------------------@+198<-parser::rule_expr@0-4 '2*-3'
To catch faulty inputs as early as possible, a good parser ought to employ assertive program design, based on highly restrictive set of grammars. In general, the initial grammar should unambiguously and accurately describe the input language. Even so, often the grammar has to change before it can used inside a program.
Because EBNF grammars are language-neutral, sets of production rules are often developed independently from the actual parser implementation. For heavy-duty applications, variances in grammar descriptions of the same input language may introduce a degree of ambiguity or inaccuracy. Many formal EBNF definitions are written for human audiences, creating possibility for inconsistency, redundancy, and sometimes hard-to-detect typographical errors. When original productions are re-written by automated tools, certain rules could become misinterpreted and turn into less affirmative constructs, if compared to the initial grammar requirements.
After initial inspection, elimination of left factoring and left recursion (also referred to as grammar transformations) are common steps in accommodation of new grammars. Soon after, a review of miscellaneous kleene quantifiers should take place. For example, kleene-based production rule R may succeed on empty string, but the parent rules that invoke R may not be willing to accept the empty match as a satisfactory outcome. Therefore, rules invoking R could use more restrictive expression, such as entity(R) to protect themselves against empty matches.
While traces may help to explore dynamics of grammar evaluation, understanding of a large trace output can be intimidating experience, particularly if the trace has considerable characteristics of the length and depth.
Debugging with traces requires disciplined approach of narrowing down the search to places with highest degree of relevance to the potential bug. Statements (sentences) of input failing to pass grammar evaluation yield clues about the problem. A handful of earlier, properly evaluated statements, should also be identified and analyzed. Each iterative step may require restarting the program with considerably smaller amount of input, sufficient to reproduce the bug. Eventually, the search area will become specific to a particular grammar rule, lexeme, or operator, directly associated with the problem.
In many cases it may be helpful to locate traces where lexer has made best progress in successful evaluation of the input text. When working with absolute offsets, consider using offset2line.cpp utility, or a similar tool, that can translate absolute offset to the actual line number in the input file.
Don't be discouraged by large amounts of trace information, routinely generated by parser programs. Use tools to split the output files into smaller, manageable pieces. Most of the time the problem will occur at the end of the execution; therefore, the tail of the trace will be a logical place to begin looking for clues.
Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.