<<< R+N, one to N matches | Table Of Contents | R+pair(N,npos), exactly N matches >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Format:
R + std::make_pair( N, M )
where
operand R is a valid CTTL grammar expression
( N, M ) is a pair of ordered non-negative integer numbers, such that ( N <= M ), whose values determine the mode of quantifier evaluation as follows:
N | M | R + std::make_pair( N, M ) evaluation mode: |
---|---|---|
0 | 0 | Equivalent to greedy +R, "one or more matches". |
0 | M | Equivalent to ungreedy R+M, "one to M matches". |
1 | 1 | Equivalent to a single ungreedy match of R. |
N | npos |
If ( M == std::string::npos ),
the quantifier evaluation mode changes to R+pair(N,npos), "exactly N matches". |
N | M |
For any pair of values, where ( N <= M ),
the behavior is "from N up to M matches", as described in this section. If ( N > M ), the behavior is undefined. |
Algorithm:
Generic R+pair(N,M) Kleene quantifier implements greedy evaluation algorithm, attempting as many matches as possible, anywhere between N and M matches. At least N matches of grammar expression R are required for this quantifier to succeed. Up to M matches are accepted, but no more.
R+pair(N,M) fails if less than N, or more than M matches exist.
Expression R+pair(N,M) succeeds when the result of R is an empty symbol (epsilon match), including the case when the parseable substring is empty:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp; const_edge<> substring( inp ); const_edge<> token = substring; size_t result = token( entity() + std::make_pair( 1, 1 ) ).match( substring ); assert( result != std::string::npos ); assert( token == "" ); return 0; }
Usage notes:
If empty symbol is found, the empty match never repeats, completing the evaluation loop. To exclude empty matches from the results of binary Kleene plus evaluation, the operand R can be decorated by a non-empty match validator, entity(R), which rejects empty matches of R:
entity( R ) + std::make_pair( N, M )
For example,
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; size_t rule_curious( const_edge<> substr_ ) { // Matches single character 'A' or an empty symbol: return ( 'A' | symbol( true ) ).match( substr_ ); } int main() { std::string inp = "XYZ"; const_edge<> substring( inp ); const_edge<> token = substring; size_t result = token( entity( first( islower ) | CTTL_STATIC_RULE( rule_curious ) ) + std::make_pair( 1, 1 ) ).match( substring ); assert( result == std::string::npos ); return 0; }
Space sensitivity:
The space sensitivity of R+pair(N,M) is transparent. The expression R controls the space grammar evaluation:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = " ABC DEF "; const_edge< policy_space<> > substring( inp ); const_edge<> token = substring; size_t result = token( entity( isupper ) + std::make_pair( 2, 3 ) ).match( substring ); assert( result != std::string::npos ); assert( token == "ABC DEF" ); return 0; }
Searchability:
Search grammar evaluation algorithms
are enabled for R+pair(N,M) Kleene plus quantifier.
The terminal search and Kleene plus operator formulate stationary relationship, which is determined by the order of operations:
!( R + std::make_pair( N, M ) ) // searches for "N to M repeats of R" !R + std::make_pair( N, M ) // repeats "searches of R" N to M times
Clearly, the above two expressions are not equivalent. According to C++ operator precedence rules, the latter expression is evaluated as (!R)+std::make_pair( N, M ).
Only first occurence of the nearest terminal symbol in R is searched for; the second, third, and the rest of the terminals, as well as all subsequent matches of R, are processed by the match grammar evaluation algorithm. This is why the role of space policy is important in the following example:
#define CTTL_TRACE_EVERYTHING
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "ABC def ghi";
const_edge< policy_space<> > substring( inp );
const_edge<> token = substring;
size_t result = token(
entity( islower ) + std::make_pair( 2, 3 )
).find( substring );
assert( result != std::string::npos );
assert( token == "def ghi" );
return 0;
}
If policy policy_space<> wasn't used, the "ghi" symbol would not be a part of the matched token.
Trace output format:
The trace symbol of binary Kleene plus quantifier is the plus sign '+', enclosed in a pair of symmetrical braces. The above example generates the following trace:
-------------@ABC def ghi?{e! 0-11 :3:1 -------------@ABC def ghi? {+! 0-11 -----------------ABC def@| $ 7-11 islower -------------ABC def ghi@| $ 11-11 islower ~~~~~~~~~~~~~ABC def ghi@~ L 11-11 FAIL empty substring } -----------------@def ghi| e 4-11 }
The exclamation mark '!' after the plus sign emphasizes that binary Kleene plus operator was evaluated by the terminal symbol search grammar evaluation algorithm.
A "FAIL empty substring" message indicates that Kleene loop has stopped as soon as entity(islower) lexeme exhausted all input characters.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< R+N, one to N matches | Table Of Contents | R+pair(N,npos), exactly N matches >>> |