<<< -R, negative lookahead assertion | Table Of Contents | R1&R2, binary set intersection operator >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Category:
Format:
R1 - R2
where operands R1 and R2 are valid CTTL grammar expressions, representing two arbitrary sets of strings.
Algorithm:
Overloaded binary minus operator specifies matches of R1 that are not matches of R2.
Complement construct (R1-R2) has left-to-right associativity: the expression R1 is evaluated first, and then R2 is evaluated. If evaluation of R1 fails, the expression R2 is never evaluated, realizing a short-circuit type of evaluation.
Parseable substring, presented to the R2 operand for evaluation, is restricted by the boundaries of the token that matched R1.
The lexer performs negative lookahead assertion of grammar expression R2:
The evaluation of R2 operand does not consume any input characters. The implementation
Result of (R1-R2) fits the description of the set complement from the set theory:
If arbitrary set U represents a set of all possible string matches of R1,
and another set A represents a set of all strings matching R2,
then expression (R1-R2) matches a new set U-A, which is an absolute complement (or simply a complement) of the subset A, defined as a set of all strings of U, which are not elements of A.
Usage notes:
It should be noted that two expressions
entity( isalnum ) - entity( isupper ) entity( isalnum ) + -entity( isupper )
are very different. The first expression finds an alphanumeric multi-character entity, and then verifies that none of its characters are uppercase. The second expression makes negative lookahead assertion, which excludes uppercase characters beyond the alphanumeric entity.
The rationale for the set complement (R1-R2) is to validate R1-matched symbols against additional grammar of the R2 operand.
The negative lookahead assertion (R1+ -R2), on the other hand, validates that matches of R2 do not follow the matches of R1.
Space sensitivity:
The space sensitivity of R1 operand is enabled.
The R2 operand is evaluated in strict grammar evaluation mode. Space sensitivity of R2 is controlled by the space policy object, constructed along with the temporary copy of the parseable substring. The temporary copy of substring is created prior to R2 evaluation as an instance of strict_edge_T, which is a member typedef of CTTL substring classes.
Searchability:
Example:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" using namespace cttl; int main() { std::string inp = "XYZ"; const_edge<> substring( inp ); const_edge<> token = substring; size_t result = token( entity( isalpha ) - entity( islower ) ).match( substring ); assert( result != std::string::npos ); assert( token == inp ); return 0; }
Trace output format:
The trace symbol of (R1-R2) construct is the minus sign '-', enclosed in a pair of symmetrical braces. The above example generates the following trace:
---------------------@XYZ?{e 0-3 :3:1 ---------------------@XYZ? {- 0-3 ---------------------XYZ@| $ 3-3 isalpha ~~~~~~~~~~~~~~~~~~~~~~~~@~ $ 0-3 FAIL islower } ---------------------@XYZ| e 0-3 }
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< -R, negative lookahead assertion | Table Of Contents | R1&R2, binary set intersection operator >>> |