<<< Quote search parser example  Table Of Contents  !R, terminal symbol search operator >>> 
Common Text Transformation Library http://cttl.sourceforge.net/
Many input languages are based on a notion of a word, defined as a grouping of characters. Word boundaries are established by membership of individual characters in a particular character class, such as alphabetic, punctuation, digit, etc., or their combinations. A word can range from one to more characters in length, but regardless of the size, the word is always treated by the scanner as an atomic entity. Atomic entities cannot be split by white space or other delimiters.
While individual words represent terminal symbols of the input language, combinations of words formulate the so called nonterminal constructs. Not all combinations of words are legal; therefore, a set of formal rules, collectively referred to as a formal grammar, has to be set forth to exclude improper inputs from processing.
Each grammar rule describes a pattern of input symbols. For instance, the simplest pattern is a sequence of tokens:
entity( isalpha ) + entity( isdigit )
The above sequence is a nonterminal structure, made of an alphabetical character entity, followed by a group of digits.
Patterns represent combinations of input symbols. More importantly, patterns establish relations between distinct parts of the input. For example, groups of input symbols denoted as R, R1, R2, ..., and RN can form associations such as
This is not an exhaustive list of all possible associations between symbols. Other types of relations may exist. Altogether, logical connections between symbols and recursive grammar rules cause patterns to gradually grow in complexity.
In resemblance with oversized C++ logical expressions, complicated input patterns can be broken into smaller, manageable subpatterns. The latter can be further reorganized into a set of unique, well documented grammar production rules. (See lexer design documentation for detailed coverage of grammar transformations.)
In CTTL, input patterns are written as C++ expressions, which are commonly referred to as grammar expressions. Relations within patterns are specified by a set of overloaded C++ operators, whose operands are
lexeme functions, describing terminal symbols,
parenthesized subexpressions, representing subpatterns, and
C++ function adaptors, invoking separately defined grammar production rules.
Grammar expression adaptors, adding side effects to grammar expressions.
Quotes, describing grammar constructs with distinct opening and closing fragments.
Overloaded unary operator modifies behavior of its operand. Binary operators chain two subexpressions together and specify relationship between them. For example, the pattern
+( symbol( '0' )  symbol( '1' ) )
matches repetitions of zeroes and ones of any size, such as "0", "1", "00", "01", "10", "11", and so on.
Since precedence of C++ operators is unaffected by overloading, special care has to be taken to enforce proper order of evaluation, which generally requires parenthesis to group subexpressions.
The following overloaded CTTL operators are available:
Operator  Description 

!R terminal symbol search
!!R nonterminal symbol search

Search operators

*R zero or more matches
R*N zero to N matches
+R one or more matches
R+N one to N matches
R+pair(N,M) N to M matches
R+pair(N,npos) exactly N matches

Kleene quantifiers

R1+R2 sequence
R1^R2 concatenation

Sequence operators

R negative lookahead assertion
lookbehind(R1,R2) positive lookbehind assertion
begin(R) positive lookahead assertion
entity(R) nonempty match validator

Assertions

R1R2 set complement
R1&R2 set intersection
R1R2 set union
R1R2 POSIX union

Logical set operators

Copyright © 19972009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Quote search parser example  Table Of Contents  !R, terminal symbol search operator >>> 