<<< *R, zero or more matches | Table Of Contents | +R, one or more matches >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Format:
R * N
where
operand R is a valid CTTL grammar expression
N is a non-negative integer number.
Algorithm:
Overloaded binary Kleene star operator specifies from zero up to N matches of the grammar expression R.
Expression R*N implements ungreedy evaluation algorithm, never attempting more than N matches.
As soon as Nth occurrence of the match is found, the evaluation stops, yielding successful result. (Fewer than N matches also qualify as a good match.)
(If N is zero, the evaluation of R*0 is an equivalent of *R, a greedy repeat of zero or more matches.)
Expression R*N always succeeds, including outcomes when
The expression R fails;
The result of R is empty symbol (epsilon match);
The parseable substring is empty:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp; // empty string const_edge<> substring( inp ); size_t result = ( symbol()*1 ).match( substring ); assert( result != std::string::npos ); return 0; }
Usage notes:
Many regular expression engines recognize R? syntax, which specifies "zero or one matches of R". In CTTL, it is written as
R * 1
Since binary Kleene star succeeds when its operand R yields an epsilon match, to prevent an infinite loop, the empty match never repeats.
To exclude empty matches from the results of binary Kleene star evaluation, its operand R can be decorated by non-empty match validator, entity(R), which rejects empty matches of R:
entity( R ) * N
For example,
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; size_t rule_curious( const_edge<> substr_ ) { // Matches single character 'A' or an empty symbol: return ( 'A' | symbol( true ) ).match( substr_ ); } int main() { std::string inp = "XYZ"; const_edge<> substring( inp ); const_edge<> token = substring; size_t result = token( entity( first( islower ) | CTTL_STATIC_RULE( rule_curious ) )*1 ).match( substring ); assert( result != std::string::npos ); return 0; }
Despite the fact that entity(R)*N form seems more theoretical than practical, many EBNF grammar definitions imply that empty matches should be blocked in all non-terminal constructs. In such case entity(R)*N constructs may be the result of rendering by automated tools, designed to process EBNF notation. A more restrictive (and more realistic) approach of interpreting R's result may be
entity( R ) + N
form, which uses binary Kleene plus operator to avoid all empty matches.
Space sensitivity:
The space sensitivity of R*N is transparent. The expression R controls the space grammar evaluation:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = " ABC DEF "; const_edge< policy_space<> > substring( inp ); const_edge<> token = substring; size_t result = token( entity( isupper )*1 ).match( substring ); assert( result != std::string::npos ); assert( token == "ABC" ); return 0; }
Searchability:
Search grammar evaluation algorithms
are enabled for binary Kleene star quantifier.
The terminal search and Kleene star operator formulate stationary relationship, which is determined by the order of operations:
!(R*3) // searches for "zero to three repeats of R" !R*3 // repeats zero to three "searches of R"
Clearly, the above two expressions are not equivalent. According to C++ operator precedence rules, the latter expression is evaluated as (!R)*3.
Only first occurence of the nearest terminal symbol in R is searched for; the second, third, and the rest of the terminals, as well as all subsequent matches of R, are processed by the match grammar evaluation algorithm. This is why the role of space policy is important in the following example:
#define CTTL_TRACE_EVERYTHING
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "ABC def ghi";
const_edge< policy_space<> > substring( inp );
const_edge<> token = substring;
size_t result = token(
entity( islower ) * 2
).find( substring );
assert( result != std::string::npos );
assert( token == "def ghi" );
return 0;
}
If policy policy_space<> wasn't used, the "ghi" symbol would not be a part of the matched token.
Trace output format:
The trace symbol of binary Kleene star quantifier is the asterisk sign '*', enclosed in a pair of symmetrical braces. The above example generates the following trace:
-------------@ABC def ghi?{e! 0-11 :3:1 -------------@ABC def ghi? {*! 0-11 -----------------ABC def@| $ 7-11 islower -------------ABC def ghi@| $ 11-11 islower -------------ABC def ghi@| i 11-11 kleene_list: user-specified match limit: bailing out } -----------------@def ghi| e 4-11 }
The exclamation mark '!' after the asterisk emphasizes that binary Kleene star operator was evaluated by the terminal symbol search grammar evaluation algorithm.
A "bailing out" message indicates that user-specified limit of 2 matches of entity(islower) lexeme was reached, triggering a break in the loop of Kleene star implementation.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< *R, zero or more matches | Table Of Contents | +R, one or more matches >>> |