<<< R1&R2, binary set intersection operator | Table Of Contents | R1||R2, binary POSIX union operator >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Format:
R1 | R2
where operands R1 and R2 are valid CTTL grammar expressions, representing two arbitrary sets of string matches.
Algorithm:
Overloaded binary bitwise OR operator specifies matches of either R1, or R2.
Set union construct (R1|R2) has left-to-right associativity:
The expression R1 is evaluated first. If R1 succeeds,
The result of the union is the result of expression R1;
The expression R2 is never evaluated, realizing a short-circuit type of evaluation.
If evaluation of R1 fails, the lexer backtracks and result of the union is the result of expression R2.
Result of (R1|R2) fits the description of the set union from the set theory:
CTTL implementation of the set union operator is based on the traditional Non-deterministic Finite Automaton (NFA) behavior:
The NFA may leave longer matches undiscovered, because the evaluation stops as soon as the first match is found. (Therefore, traditional NFA is ungreedy.)
Traditional NFA backtracks in the event of a non-match: if R1 fails, the upper boundary of parseable substring backtracks, giving R2 a chance to evaluate in equal manner.
Space sensitivity:
Searchability:
Search grammar evaluation algorithms
are enabled for (R1|R2) union. The terminal search is distributive: the expressions
!(R1 | R2) (!R1) | (!R2)
are equivalent.
The following sample demonstrates a terminal search using set union operator:
#define CTTL_TRACE_EVERYTHING
#include "cttl/cttl.h"
using namespace cttl;
int main()
{
std::string inp = "+123";
const_edge<> substring( inp );
const_edge<> token = substring;
size_t result = token(
entity( isalpha ) | entity( isdigit )
).find( substring );
assert( result != std::string::npos );
assert( token == "123" );
return 0;
}
Trace output format:
The trace symbol of (R1|R2) construct is the vertical bar (pipe sign) '|', enclosed in a pair of symmetrical braces. The above example generates the following trace:
--------------------@+123?{e! 0-4 :3:1 --------------------@+123? {|! 0-4 ~~~~~~~~~~~~~~~~~~~~~~~~@~ $ 0-4 FAIL isalpha --------------------+123@| $ 4-4 isdigit } ---------------------@123| e 1-4 }
The exclamation mark '!' after the pipe sign emphasizes that set union operator was evaluated by the terminal symbol search grammar evaluation algorithm.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< R1&R2, binary set intersection operator | Table Of Contents | R1||R2, binary POSIX union operator >>> |