<<< Lexeme first(is...) | Table Of Contents | Lexeme entity(text) >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Format:
// Helper functions yielding instance-based lexeme implementation:
symbol( char const* )
symbol( wchar_t const* )
symbol( std::string const& )
symbol( std:::wstring const& )
// Helper functions yielding reference-based lexeme implementation:
symbol( std::string const* )
symbol( std:::wstring const* )
where argument specifies text content to match.
Algorithm:
The terminal symbol lexeme matches the exact text specified by the lexeme argument.
If parseable substring is empty, the terminal symbol lexeme always fails.
The empty string is a valid terminal symbol, yielding an epsilon match:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = " XYZ "; const_edge< policy_space<> > substring( inp ); node<> pos = substring.first; size_t result = pos( symbol( "" ) ).match( substring ); assert( result != std::string::npos ); assert( pos.offset() == 1 ); return 0; }
Usage notes:
An instance-based lexeme, such as symbol("ABC") in
size_t rule_abc( const_edge<>& substr_ ) { return symbol( "ABC" ).match( substr_ ); }
creates an std::string (or std::wstring) member object to store a private copy of the text argument to match. Consequently, a temporary object, such as std::string("ABC"), is constructed each time the thread of execution runs through the symbol("ABC") expression.
For many applications, instance-implementation can be inefficient. Alternatively, a pointer to string argument format may be used to specify the text to look for:
size_t rule_abc( const_edge<>& substr_ )
{
static const std::string ABC( "ABC" );
return symbol( &ABC ).match( substr_ );
}
In the latter version of rule_abc(), the reference-based implementation of terminal symbol lexeme is compiled. The reference implementation stores a reference to the specified string. The string is expected to be in scope while the lexeme is in use.
Using pointer to string when constructing symbol lexeme offers advantage of reusing the same string object in different parts of the program. Constant strings could be constructed just once when the program is loaded into memory, yielding a faster implementation.
Another benefit of the reference-based implementation is an ability to centralize all symbolic constants in one place, defining a common alphabet of the input language. Such design stimulates uniform sharing of the abstract symbols between multiple grammar rules.
Space sensitivity:
The space sensitivity of the terminal symbol lexeme is enabled:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = " XYZ "; const_edge< policy_space<> > substring( inp ); const_edge<> token = substring; static const std::string XY( "XY" ); size_t result = token( symbol( &XY ) ).match( substring ); assert( result != std::string::npos ); assert( token == "XY" ); return 0; }
Searchability:
Search grammar evaluation algorithms
are enabled for the terminal symbol lexeme:
#define CTTL_TRACE_EVERYTHING
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "123 XYZ";
const_edge<> substring( inp );
const_edge<> token = substring;
size_t result = token( symbol( "XY" ) ).find( substring );
assert( result != std::string::npos );
assert( token == "XY" );
return 0;
}
Trace output format:
The trace symbol of the terminal symbol lexeme is uppercase 'T', annotated by the matched symbol text. The above example generates the following trace:
-----------------@123 XYZ?{e! 0-7 :3:1 ------------------123 XY@| T 6-7 XY ----------------------@XY| e 4-6 }
CTTL tracing is not supported when the program is compiled in wide character mode.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Lexeme first(is...) | Table Of Contents | Lexeme entity(text) >>> |