<<< Strict-mode grammar evaluation | Table Of Contents | Text transformations >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
During grammar expression evaluation, the tokens are either instantly discarded, or processed by the parser. The tasks include validation, storage, and other data-related procedures. Consequently, the parser receives and consumes the data generated by the lexer. Program functions, responsible for immediate processing steps during lexical analysis, are commonly known as a semantic actions.
A parser component of the program can be viewed as a collection of all semantic actions. When semantic action detects an error, it can signal the error condition back to the lexer. Based on the structure of the grammar rule, the lexer may choose to proceed with an alternative grammar evaluation path, or discontinue further input processing and return the error condition back to the caller.
In CTTL, semantic action format is similar to production rule function format: corresponding C++ functions use the same signature and return the evaluation result that has the same meaning to the lexer. An do-nothing version of an exemplary semantic action s() is
template< typename SubstrT > size_t s( SubstrT& U ) { return U.first.offset(); }
where
SubstrT is a typename of CTTL substring, passed to the processing function by mutable reference.
U is a parseable substring, an instance of cttl::const_edge or cttl::edge.
Semantic action s() is not required to be a C++ template function; it can be a member function of a template class, or simply a functon with a hardwired type of the substring argument.
Besides C++ functions that implement semantic actions, described in this section, CTTL supports other useful tools for constructing grammar-based parsers:
lambda expressions (in-line semantic actions),
higher-order functions (functional composition), and
closures (delayed function calls).
To make clean distinction between grammar rule functions and semantic actions, CTTL lexer and parser can be implemented in two separate C++ classes. Typically, the classes are related:
For example, program word_count.cpp includes semantic actions named parser::count_words() and parser::replace_words() in the parser class. The lexer class defines two grammar rule functions lexer::start() and lexer::word().
The program inputs a few "words", entered on the command-line, counts them, and replaces the words with the "<WORD/>" tag. The "words" can be any alphabetical sequences of characters, except "abc" and "xyz", which are designated to be the "keywords" of the input language.
Keywords: word_count.cpp, keywords, lexer, parser, production rule functions, CTTL_RULE, std::set, begin, string_array2string
// sample code: word_count.cpp // demonstrates stateful parser and lexer implementation. //#define NDEBUG // must appear before assert.h is included to stop assertions from being compiled //#define CTTL_TRACE_EVERYTHING #include <iostream> #include "cttl/cttl.h" using namespace cttl; template< typename SubstrT > struct base_parser { // parser defines two kinds of substrings: typedef SubstrT substr_T; typedef typename SubstrT::strict_edge_T strict_edge_T; // semantic actions: size_t count_words( strict_edge_T& ) const { return 0; } size_t replace_words( strict_edge_T& ) const { return 0; } }; template< typename SubstrT > struct parser : public base_parser< SubstrT > { // parser defines two kinds of substrings: typedef SubstrT substr_T; typedef typename SubstrT::strict_edge_T strict_edge_T; int count; parser( int count_ ) : count( count_ ) { } // semantic actions: size_t count_words( strict_edge_T& substr_ ) { ++count; return substr_.second.offset(); } size_t replace_words( strict_edge_T& substr_ ) { substr_ = "<WORD/>"; //substr_ = std::string( "<WORD/>" ); //substr_.text( "<WORD/>" ); return substr_.second.offset(); } }; template< typename ParserT > struct lexer : public ParserT { // lexer defines two kinds of substrings: typedef typename ParserT::substr_T substr_T; typedef typename substr_T::strict_edge_T strict_edge_T; // lexer static data: std::set< std::string > keywords; lexer( int count_ = 0 ) : ParserT( count_ ) { // populate list of keywords: keywords.insert( "abc" ); keywords.insert( "xyz" ); } // grammar rule definitions size_t start( substr_T& substr_ ) { return ( // at least one word should be present, // but not a keyword: +( // invoke grammar rule CTTL_RULE( lexer< ParserT >::word ) & // invoke semantic action CTTL_RULE( ParserT::count_words ) & // invoke another semantic action CTTL_RULE( ParserT::replace_words ) ) ).match( substr_ ) ; } size_t word( substr_T& substr_ ) const { // a word can anything made of alphabetic // characters, but not a keyword: return ( isalpha - begin( keywords ) ).match( substr_ ); } }; int main(int argc, char* argv[]) { if ( argc == 1 ) { std::cout << "Usage: on command the line, enter some words to count, for example," << std::endl << '\t' << argv[ 0 ] << " one two three" << std::endl ; return 1; } // construct input string from the command line arguments: std::string inp; string_array2string( inp, &argv[ 1 ], ' ' ); // construct substring to be parsed: //typedef const_edge< policy_space<> > substr_T; typedef edge< policy_space<> > substr_T; substr_T substring( inp ); // construct the parser: lexer< parser< substr_T > > word_parser; // count words: if ( word_parser.start( substring ) != std::string::npos ) { std::cout << "Word count: " << word_parser.count << std::endl; } else { std::cout << "*** parser failed ***" << std::endl; return 1; } std::cout << "Input: " << inp << std::endl; return 0; }
Semantic actions can detect and handle errors during lexical analysis of the input.
Two design solutions are possible for a lexer that wants to deal with input errors:
In the first case, the strategy is to fail as soon as possible and report the error condition back to the caller.
In the second case, the error recovery allows the parser to continue processing beyond the discovery of bad input. This can lead to a discovery of more errors, all of which can be reported by the initial pass through the user input.
The job of recovery from an error is to reach a position of a synchronizing boundary of the current grammar construct, such as a phrase, an expression, or a statement. Once recovered, the lexer continues with its regular path of the grammar evaluation.
Implementation of the error recovery can be tricky, involving specialized, "private" evaluation of some auxiliary grammar constructs.
Older compilers may have difficulty resolving a function template argument from the address of another member function template. To work around the problem, a pointer to the member function can be explicitly instantiated and then used as an argument for the grammar rule adaptor.
For example, program action_handler.cpp demonstrates a rule() adaptor that invokes semantic action, implemented as a member function template of the handler class:
Keywords: action_handler.cpp, C2784
// sample code: action_handler.cpp // demonstrates semantic action implemented as member function template. #include "cttl/cttl.h" using namespace cttl; class handler { //... public: template< typename SubstrT > size_t action( SubstrT& substr_ ) { //... return substr_.first.offset(); } }; int main(int argc, char* argv[]) { std::string inp; // construct input object typedef const_edge<> substr_T; // type of parseable substring substr_T substring( inp ); // construct substring handler sa_handler; // handler class implements // semantic actions // VC7.1 error C2784 workaround: typedef size_t ( handler::* action_T ) ( substr_T& ); size_t result = rule( sa_handler, action_T( &handler::action< substr_T > ) ).match( substring ); assert( result != std::string::npos ); // assert that match succeeded return 0; }
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Strict-mode grammar evaluation | Table Of Contents | Text transformations >>> |