<<< R1|R2, binary set union operator | Table Of Contents | Utility functions, classes, and programs >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Overloaded operator: binary operator|| (C++ binary logical OR)
Evaluates matches of both R1 and R2, choosing the match in accordance with POSIX regex standard, guaranteeing the longest nearest match.
Token width: 0+ (derivative of R1 or R2)
Format:
R1 || R2
where operands R1 and R2 are valid CTTL grammar expressions, representing two arbitrary sets of string matches.
Algorithm:
Overloaded binary logical OR operator specifies matches of either R1 or R2.
Both expressions R1 and R2 are evaluated from left to right, but changing the order of POSIX union operands does not change the end result, establishing the property of commutativity.
The end result of POSIX union is determined as follows:
The result of POSIX union search !(R1||R2) is determined the same way, but the nearest string is given a priority over the longest match. If more than one match is found at the same proximity, the longest string becomes the end result of the search.
CTTL implementation of POSIX NFA (non-deterministic finite automaton) is slower than the traditional NFA-based set union algorithm (R1|R2), especially when expression contains multiple alternative branches:
R1 || R2 || R3 || ... || RN
When traditional NFA (R1|R2) finds a match, it causes an early exit from the algorithm. By contrast, POSIX NFA continues backtracking and enumerating all of possible matches, handling both sides of (R1||R2) equally.
Space sensitivity:
Searchability:
Search grammar evaluation algorithms
are enabled for (R1||R2) union. The terminal search is distributive: the expressions
!(R1 || R2) (!R1) || (!R2)
are equivalent.
The following sample grammar use POSIX union operator while searching for three choices of terminal symbols: "one", "two", and "three". The program finds a word located at the nearest offset from the beginning of the parseable substring:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = "four two three one five"; const_edge<> substring( inp ); const_edge<> word = substring; size_t result = word( symbol( "one" ) || "two" || "three" ).find( substring ); assert( result != std::string::npos ); assert( word == "two" ); return 0; }
Trace output format:
The trace symbol of (R1||R2) construct is a vertical bar (pipe sign) '|', enclosed in a pair of symmetrical braces. The above example generates the following trace:
-@four two three one five?{e! 0-23 :3:1 -@four two three one five? {|! 0-23 -@four two three one five? {|! 0-23 ------four two three one@| T 18-23 one ----------------four two@| T 8-23 two } ----------four two three@| T 14-23 three } ---------------------@two| e 5-8 }
The exclamation marks '!' after the pipe emphasize that POSIX union operators were evaluated by the terminal symbol search grammar evaluation algorithm.
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< R1|R2, binary set union operator | Table Of Contents | Utility functions, classes, and programs >>> |