<<< Lexeme entity(text) | Table Of Contents | Lexeme end(text) >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Format:
// Helper functions yielding instance-based lexeme implementation:
begin( char const* )
begin( wchar_t const* )
begin( std::string const& )
begin( std:::wstring const& )
// Helper functions yielding reference-based lexeme implementation:
begin( std::string const* )
begin( std:::wstring const* )
where argument specifies the character class to match.
Algorithm:
Positive lookahead assertion, based on a user-defined character class.
The lexeme matches permutation of characters from the user-defined character set. For example, lexeme begin("abc") matches string permutations
"a", "b", "c", "aa", "ab", "ac", "ad", ...
and backtracks to the upper boundary position of the matched symbol, which becomes the grammar evaluation result. The lexer backtracks the upper boundary of the parseable substring to that position.
Positive lookahead assertion implements ungreedy algorithm, performing evaluation of just one character at the upper boundary of the parseable substring.
The lexeme does not change the boundaries of the parseable substring.
Successful grammar evaluation yields a zero-length symbol.
Empty character set creates a stop symbol (that is, lexeme begin("") always fails):
#define CTTL_TRACE_EVERYTHING
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "XYZ";
const_edge<> substring( inp );
size_t result = begin( "" ).match( substring );
assert( result == std::string::npos );
return 0;
}
Usage notes:
Similar to symbol(std::string*) lexeme, the implementation of begin(std::string*) and begin(std:::wstring*) lexemes is reference-based, storing only a reference to the string specifying the character class to match. Other formats of the begin(text) lexeme are instance-based, storing a private copy of the character class string.
A lookahead lookup such as
begin( "+=" )
is not always a lookahead lookup for symbol "+=". The begin("+=") lexeme performs a multi-character entity lookup, matching permutations of characters '+' and '=', so it successfully matches strings
"+", "=", "+=", "=+", "++", "==", "++=", etc...
Therefore, if grammar expression needs a lookahead lookup targeting the exact text "+=", the expression should change to
begin( symbol( "+=" ) )
The latter form is using positive lookahead assertion for a particular input symbol.
Space sensitivity:
The space sensitivity of the positive lookahead assertion is enabled:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = " XXYYZZ "; const_edge< policy_space<> > substring( inp ); node<> pos = substring.first; static const std::string YX( "YX" ); size_t result = pos( begin( &YX ) ).match( substring ); assert( result != std::string::npos ); assert( pos.offset() == 1 ); return 0; }
Searchability:
Search grammar evaluation algorithms
are enabled for positive lookahead assertion lexeme:
#define CTTL_TRACE_EVERYTHING
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "123 XXYYZZ";
// char positions: 01234567890
const_edge<> substring( inp );
node<> pos = substring.first;
size_t result = pos( begin( "YX" ) ).find( substring );
assert( result != std::string::npos );
assert( pos.offset() == 4 );
return 0;
}
Trace output format:
The trace symbol of the positive lookahead assertion lexeme is the opening angle bracket '<', annotated by the definition of its character class. The above example generates the following trace:
--------------@123 XXYYZZ?{n! 0-10 :2 --------------------123 @| < 4-10 YX ------------------------@| n 4-4 }
CTTL tracing is not supported when the program is compiled in wide character mode.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Lexeme entity(text) | Table Of Contents | Lexeme end(text) >>> |