Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

grammar.cpp


SourceForge.net Logo     CTTL on    
    SourceForge    
    Download    
    Latest    
    Documentation    
    Index    
    Library    
    News    
    CVS    
    Repository    
   Other    
   Links    

A small CTTL parser is demonstrated by the following program. Because sample parser is stateless, no actual object instance is required, and all member functions are declared static:

// sample code: grammar.cpp
// demonstrates stateless cttl lexer and parser

//#define NDEBUG    // define before assert.h to stop assertions from being compiled 
//#define CTTL_TRACE_EVERYTHING

#include <iostream>
#include "cttl/cttl.h"

using namespace cttl;

struct parser {

    static size_t start( const_edge<>& edge_ )
    {
        return (
            +(
                ( entity( isspace ) & rule( &parser::event_space ) )
                |
                ( entity( isalpha ) & rule( &parser::event_alpha ) )
                |
                ( entity( iscntrl ) & rule( &parser::event_cntrl ) )
                |
                ( entity( isdigit ) & rule( &parser::event_digit ) )
                |
                ( entity( ispunct ) & rule( &parser::event_punct ) )
            )

            ).match( edge_ );
    }

    static size_t event_alpha( const_edge<>& edge_ )
    {
        std::cout << "alpha;";
        return edge_.first.offset();
    }
    
    static size_t event_space( const_edge<>& edge_ )
    {
        std::cout << "space;";
        return edge_.first.offset();
    }

    static size_t event_cntrl( const_edge<>& edge_ )
    {
        std::cout << "cntrl;";
        return edge_.first.offset();
    }

    static size_t event_digit( const_edge<>& edge_ )
    {
        std::cout << "digits;";
        return edge_.first.offset();
    }

    static size_t event_punct( const_edge<>& edge_ )
    {
        std::cout << "punct;";
        return edge_.first.offset();
    }

};

int main(int argc, char* argv[])
{
    if ( argc == 1 ) {
        std::cout << "Enter some arguments to be parsed" << std::endl;
        return 1;
    }

    input<> inp( &argv[ 1 ], ' ' );
    const_edge<> universe( new_edge( inp ) );
    if ( parser::start( universe ) == std::string::npos ) {
        std::cout << "*** parser failed ***";
        return 1;
    }
    
    std::cout << std::endl;

    return 0;
}

In this example, lexer component is compiled as a single start rule that contains expression with a list of lexemes to match character entities, such as entity( isspace ):

    static bool start( const_edge<>& edge_ )
    {
        return (
            +(
                ( entity( isspace ) & rule( parser::event_space ) )
                |
                ( entity( isalpha ) & rule( parser::event_alpha ) )
                |
                ( entity( iscntrl ) & rule( parser::event_cntrl ) )
                |
                ( entity( isdigit ) & rule( parser::event_digit ) )
                |
                ( entity( ispunct ) & rule( parser::event_punct ) )
            )

            ).match( edge_ ) != std::string::npos;
    }
Kleene plus operator, '+', in front of the expression modifies grammar to match one or more occurrences of a character entity, for example, if we enter "hello, world!" as the input argument for our program, it produces the following output:

alpha;punct;space;alpha;punct;

Character entities inside the list are connected by set-union '|' operator. The operator selects first match from a list of expressions, similar to if-else code constructs in many programming languages.

The matching character is determined by ANSI C character classification routines, such as isalpha, isdigit, etc. When the entity match is found, the corresponding parser event handler (semantic action) is invoked, for example,
	static size_t event_alpha( const_edge<>& edge_ )
	{
		std::cout << "alpha;";
		return edge_.first.offset();
	}

All parser functions receive single argument, a reference to the cttl::edge<> or cttl::const_edge<> object, which specifies a substring of the user input. The substring is accepted by the parser function to be its universe of discourse. In case of success, semantic action function returns a valid offset within the user input. Otherwise, it returns the value of std::string::npos, which indicates error condition for the CTTL lexer engine. In our example, semantic action functions such as parser::event_alpha() always succeed. They always return offset at the beginning of the universe to indicate the success.

The handler functions like event_alpha() are joined with the corresponding character entity expression by set intersection operator, '&'. For example,

	entity( isalpha ) & rule( parser::event_alpha )
Because of the left-to-right associativity of '&' operator, expression on the left hand side, entity( isalpha ), is evaluated first. If it succeeds, the implementation of '&' operator restricts the universe (as specified by the edge_ reference) to the boundaries of the substring that matched the entity( isalpha ) expression. The modified universe is given over to the right hand side expression, rule( parser::event_alpha ), for evaluation. The rule() adaptor invokes parser::event_alpha() function.



Copyright © 1997-2006 Igor Kholodov mailto:cttl@users.sourceforge.net.

Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.


Generated on Thu Nov 2 17:44:54 2006 for Common Text Transformation Library by  doxygen 1.3.9.1