<<< Parser design | Table Of Contents | Strict-mode grammar evaluation >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Grammar expressions describe syntax of the input language. The parser program uses grammar expressions to undertand input language constructs. Groupped together, closely related expressions formulate a complete set of grammar rules.
While grammars originate as pseudo-code, the C++ program needs a practical form of a grammar rule representation. By definition, EBNF grammar rules can invoke each other recursively. A set of recursive C++ functions becomes a natural choice for the principal description of the grammar rules. (The topics related to recursive grammars are discussed in the lexer design section of the documentation.)
If grammar expression R represents a unique grammar rule, it can be placed inside a C++ function as follows:
template< typename SubstrT > size_t r( SubstrT& U ) { return R.M( U ); }
Function r() evaluates grammar R, where
SubstrT is a typename of CTTL substring, passed to the processing function by mutable reference. (Function r() is not required to be a template function; it can be a member function of a template class, or simply a functon with hardwired type of the substring argument.)
U is a parseable substring, an instance of cttl::const_edge or cttl::edge.
R is a C++ expression, describing the grammar production rule.
M is grammar evaluation function: match() , find() , or bang_find().
Function r() encapsulates grammar expression R, along with the specified evaluation algorithm M. The function returns an unsigned integer that represents the position of matched symbols in the input string.
If grammar evaluation fails, r() returns the value std::string::npos.
A set of distinctive grammar production rules, organized into a set of C++ functions is illustrated by the rule_adaptor.cpp sample program, which parses fractional numbers. The grammar rule functions are named start(), digits(), and fract():
Keywords: rule_adaptor.cpp, const_edge, rule, fractional numbers
// sample code: rule_adaptor.cpp // demonstrates EBNF grammar rules #include "cttl/cttl.h" using namespace cttl; // EBNF grammar for fractional number can be written as: // // <start> := '-'* <fract> // <fract> := <digits> ('.' <digits>)* // <digits> := ( '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' )+ // // With cttl, these rules can be written as C++ functions: // size_t digits( const_edge<>& substr_ ) { return entity( isdigit ).match( substr_ ); } size_t fract( const_edge<>& substr_ ) { return ( rule( digits ) + ( '.' + rule( digits ) ) * 1 ).match( substr_ ); } size_t start( const_edge<>& substr_ ) { return ( *symbol( '-' ) + rule( fract ) ).match( substr_ ); } int main() { std::string inp = "123.45"; const_edge<> substring( inp ); size_t result = rule( start ).match( substring ); assert( result == 0 ); assert( substring.length() == 0 ); return 0; }
The production rule function invoke other rule functions using the rule() function adaptor.
To enable C++ function calls from CTTL grammar expressions, the library provides a family of function adaptors, similar to std::mem_fun() adaptors of the C++ standard library.
CTTL adaptor functions are named cttl::rule() and are defined as a set of overloaded C++ functions that take a pointer to a production rule function as an argument, and return an object that invokes the corresponding function during grammar evaluation.
The argument of cttl::rule() can be a global, static member, or a class member function. The sample program rule_formats.cpp presents allowable signatures of rule()-adaptable functions:
Keywords: rule_formats.cpp, rule, functor, function object, global function, member function, static member function
// rule_formats.cpp // Examples of function signatures that can be used // as wrapper functions for CTTL grammar rules. #include "cttl/cttl.h" using namespace cttl; // Global functions: size_t gfc( const_edge<>& ) { std::cout << "global function accepting const_edge\n"; return 0; } size_t gfm( edge<>& ) { std::cout << "global function accepting edge\n"; return 0; } // Member functions: struct Parser { static size_t sfc( const_edge<>& ) { std::cout << "static member accepting const_edge\n"; return 0; } static size_t sfm( edge<>& ) { std::cout << "static member accepting edge\n"; return 0; } size_t mfc( const_edge<>& ) { std::cout << "member function accepting const_edge\n"; return 0; } size_t mfcc( const_edge<>& ) const { std::cout << "const member accepting const_edge\n"; return 0; } size_t mfm( edge<>& ) { std::cout << "member function accepting edge\n"; return 0; } size_t mfmc( edge<>& ) const { std::cout << "const member accepting edge\n"; return 0; } };//struct Parser // Function objects (functors): struct Rc { size_t operator() ( const_edge<>& ) { std::cout << "functor accepting const_edge\n"; return 0; } }; struct Rcc { size_t operator() ( const_edge<>& ) const { std::cout << "const functor accepting const_edge\n"; return 0; } }; struct Re { size_t operator() ( edge<>& ) { std::cout << "functor accepting edge\n"; return 0; } }; struct Rec { size_t operator() ( edge<>& ) const { std::cout << "const functor accepting edge\n"; return 0; } }; int main() { std::string inp; const_edge<> const_substr( inp ); edge<> mute_substr( inp ); rule( gfc ).match( const_substr ); rule( gfm ).match( mute_substr ); rule( Parser::sfc ).match( const_substr ); rule( Parser::sfm ).match( mute_substr ); Parser parser; rule( parser, &Parser::mfc ).match( const_substr ); rule( parser, &Parser::mfcc ).match( const_substr ); rule( parser, &Parser::mfm ).match( mute_substr ); rule( parser, &Parser::mfmc ).match( mute_substr ); Rc rc; Rcc rcc; rule( rc ).match( const_substr ); rule( rcc ).match( const_substr ); Re re; Rec rec; rule( re ).match( mute_substr ); rule( rec ).match( mute_substr ); return 0; }
Simple trace messages, such as those produced from the rule_formats.cpp program, could quickly become programmer's liablility: there is no easy way to switch the traces on and off, or replace std::cin by other types of output streams. CTTL has a number of built-in debugging and tracing facilities, which can be controlled by the precompiler symbols, such as
#define CTTL_TRACE_RULES // Turns on rule()-level tracing
Important step towards enabling traceable grammars is replacing the rule() function adaptors by debugging macros CTTL_STATIC_RULE() and CTTL_MEMBER_RULE(). While tracing is disabled, the macros invoke corresponding rule() adaptors. When trace output is enabled, the macros capture additional information, such as name of the actual C++ function being invoked, and the source code location of the caller.
A modified version of the above sample, rule_traced.cpp, demonstrates trace-enabled program using CTTL_STATIC_RULE() and CTTL_MEMBER_RULE() function adaptors:
Keywords: rule_traced.cpp, CTTL_TRACE_MESSAGE, CTTL_STATIC_RULE, CTTL_MEMBER_RULE, CTTL_TRACE_RULES, traced function calls
// rule_traced.cpp // Examples of function signatures that can be used // as wrapper functions for CTTL grammar rules. #define CTTL_TRACE_RULES // Turns on rule()-level tracing #include "cttl/cttl.h" using namespace cttl; // Global functions: size_t gfc( const_edge<>& ) { CTTL_TRACE_MESSAGE( "global function accepting const_edge" ); return 0; } size_t gfm( edge<>& ) { CTTL_TRACE_MESSAGE( "global function accepting edge" ); return 0; } // Member functions: struct Parser { static size_t sfc( const_edge<>& ) { CTTL_TRACE_MESSAGE( "static member accepting const_edge" ); return 0; } static size_t sfm( edge<>& ) { CTTL_TRACE_MESSAGE( "static member accepting edge" ); return 0; } size_t mfc( const_edge<>& ) { CTTL_TRACE_MESSAGE( "member function accepting const_edge" ); return 0; } size_t mfcc( const_edge<>& ) const { CTTL_TRACE_MESSAGE( "const member accepting const_edge" ); return 0; } size_t mfm( edge<>& ) { CTTL_TRACE_MESSAGE( "member function accepting edge" ); return 0; } size_t mfmc( edge<>& ) const { CTTL_TRACE_MESSAGE( "const member accepting edge" ); return 0; } };//struct Parser // Function objects (functors): struct Rc { size_t operator() ( const_edge<>& ) { CTTL_TRACE_MESSAGE( "functor accepting const_edge" ); return 0; } }; struct Rcc { size_t operator() ( const_edge<>& ) const { CTTL_TRACE_MESSAGE( "const functor accepting const_edge" ); return 0; } }; struct Re { size_t operator() ( edge<>& ) { CTTL_TRACE_MESSAGE( "functor accepting edge" ); return 0; } }; struct Rec { size_t operator() ( edge<>& ) const { CTTL_TRACE_MESSAGE( "const functor accepting edge" ); return 0; } }; int main() { std::string inp; const_edge<> const_substr( inp ); edge<> mute_substr( inp ); CTTL_STATIC_RULE( gfc ).match( const_substr ); CTTL_STATIC_RULE( gfm ).match( mute_substr ); CTTL_STATIC_RULE( Parser::sfc ).match( const_substr ); CTTL_STATIC_RULE( Parser::sfm ).match( mute_substr ); Parser parser; CTTL_MEMBER_RULE( parser, &Parser::mfc ).match( const_substr ); CTTL_MEMBER_RULE( parser, &Parser::mfcc ).match( const_substr ); CTTL_MEMBER_RULE( parser, &Parser::mfm ).match( mute_substr ); CTTL_MEMBER_RULE( parser, &Parser::mfmc ).match( mute_substr ); Rc rc; Rcc rcc; CTTL_STATIC_RULE( rc ).match( const_substr ); CTTL_STATIC_RULE( rcc ).match( const_substr ); Re re; Rec rec; CTTL_STATIC_RULE( re ).match( mute_substr ); CTTL_STATIC_RULE( rec ).match( mute_substr ); return 0; }
If CTTL_TRACE_RULES is commented out,
//#define CTTL_TRACE_RULES // Turns on rule()-level tracing
the output is turned off completely, so the tracing code is no longer a part of the compiled program. There is no performance penalty for using macros instead of the cttl::rule() adaptors. For debugging reasons and the ability to tune-up grammars, the macro adaptor format should be a preferred way to invoke CTTL grammar rule functions in most programs.
Another CTTL sample, functor.cpp, demonstrates usage of a function object (see SGI's definition of functors ) in place of a grammar rule function. The advantage of using a function object is in its tread-safe memory allocation. Thus, a functor successfully represents grammar production rule with its own local state.
Keywords: functor.cpp, std::unary_function
// sample code: functor.cpp // demonstrates lexer function object invoked by the grammar rule. #include "cttl/cttl.h" using namespace cttl; class lexer : public std::unary_function< const_edge<>, size_t >{ char m_char; public: lexer( char char_ ) : m_char( char_ ) {} size_t operator() ( const_edge<>& edge_ ) const { return symbol( m_char ).match( edge_ ); } }; int main() { std::string inp = "ZZZA"; const_edge<> substring( inp ); char letter = 'Z'; // Construct grammar rule for a single character: lexer grammar( letter ); // The following statement combines grammar production // named "grammar" with single character 'A': size_t result = ( +rule( grammar ) + 'A' ).match( substring ); assert( result == 0 ); // assert match at the beginning of substring assert( substring.length() == 0 ); // make sure substring is fully consumed return 0; }
It is important to note that the function object representing CTTL grammar rule must be assignable, and, therefore, copy-constructible.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Parser design | Table Of Contents | Strict-mode grammar evaluation >>> |