<<< match: pattern matching | Table Of Contents | bang_find: non-terminal symbol search >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Grammar evaluation algorithm
size_t R::find( SubstrT& substr_ ); // search for terminal symbol
initiates the search for the nearest terminal symbol L in a non-terminal grammar expression R. The searchable range of characters is limited to the boundaries of the parseable substring substr_.
If the search succeeds,
The rest of the grammar expression R after the symbol L is processed using the match() evaluation.
The function returns a value of the unsigned integral type size_t, which specifies the absolute position of the nearest terminal symbol L.
The position of substr_.first is set to to the first unparsed character (one position beyond the last character of the matched fragment of text.)
If symbol L is not found in the specified range, the evaluation fails, and
For example,
#include <iostream>
#define CTTL_TRACE_EVERYTHING // define to turn tracing on
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "123 ABC 456 DEF GHI 789";
const_edge< policy_space<> > substring( inp );
const_edge<> token = substring;
while (
token(
entity( isalpha )
+
entity( isdigit )
).find( substring )
!=
std::string::npos
)
{
std::cout << "Found: " << token << std::endl;
}
return 0;
}
This sample grammar generates the following trace output:
-@123 ABC 456 DEF GHI 789?{e! 0-23 :3:1 -@123 ABC 456 DEF GHI 789? {;! 0-23 -----------------123 ABC@| $ 7-23 isalpha -------------123 ABC 456@| $ 11-23 isdigit } -----------------@ABC 456| e 4-11 } Found: ABC 456 ------------@ DEF GHI 789?{e! 11-23 :3:1 ------------@ DEF GHI 789? {;! 11-23 ---------123 ABC 456 DEF@| $ 15-23 isalpha ~~~~~~~~123 ABC 456 DEF @~ $ 16-23 FAIL isdigit } ~~~~~~~~~~~~~123 ABC 456@~ e3 11-23 FAIL }
The program searches for a group of alphabetical characters, followed by digits. Only one occurence of this pattern is found. The search stops when the program finds group of characters "DEF", not followed by any digits, as indicated by "FAIL isdigit" trace message. (To ignore an unsuccessful result and automatically resume the search, use bang_find() algorithm instead of find().)
The search process is visualized by adding a trace level macro
#define CTTL_TRACE_EVERYTHING
before cttl/cttl.h header is included.
All searchable constructs in CTTL grammar adhere to the rule of search transitivity, which governs the distribution of search requests through the flow of the non-terminal grammar constructs. Such constructs include overloaded operators, Kleene quantifiers, grammar expression adaptors, and a few other lexer components.
In essence, the transitivity principal suggests that successful find of the targeted terminal symbol triggers transition from "search" to "match" grammar evaluation algorithm in lexer's state. Beyond the first token, remaining grammar is processed as "match". Compared to match, the search request is short lived, getting consumed as soon as the target of the search is located.
The concept of transitivity is tightly related to the functionality of terminal symbol search operator, !R, which makes transitivity rules externally visible in the grammar expression syntax. The operator adds capability to execute searches in-line, at the level of sub-expressions. Refer to terminal symbol search operator reference page for a complete list of transitive syntactical forms and information about their usage.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< match: pattern matching | Table Of Contents | bang_find: non-terminal symbol search >>> |