<<< Grammar reference | Table Of Contents | Lexemes >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
The following outline compares regex patterns to CTTL grammar constructs:
Anchors:
^ regex start of line: end('\n')
\A regex start of string: begin()
$ regex end of line: begin('\n')
\Z regex end of string: end()
\b (\<) regex word boundary:
begin(CTTL_QWERTY_123_)
(start of word; before the first character in a word.)
\b (\>) regex word boundary:
end(CTTL_QWERTY_123_)
(end of word; after the last character in a word.)
\B regex not word boundary: see
space policy
grammar.
(not start of word.)
\B regex not word boundary: see
space policy
grammar.
(not end of word.)
Character classes:
\c regex control character class: entity(iscntrl)
\s regex white space class: entity(isspace)
\S regex not white space class: - begin(isspace)
\d regex digit class: entity(isdigit)
\D regex not digit class: - begin(isdigit)
\w ([:word:]) regex word:
literal()
(digits, letters and underscore)
\W regex not word:
+
(
-
begin
(
CTTL_QWERTY_123_
)
^
symbol()
)
\xhh regex hexadecimal character hh: symbol(0xhh)
\Onnn regex octal character nnn: symbol(Onnn)
POSIX Character Classes:
[:class:] regex POSIX character classes:
entity(is...)
(where class is upper, lower, alpha, alnum, digit, xdigit, punct, blank, space, cntrl, graph, and print.)
Assertions:
?= regex positive lookahead assertion:
begin(R)
expression assertion
(something followed by something else.)
?| regex negative lookahead assertion: (
-R
) negative lookahead assertion
(something not followed by something else.)
?>= regex positive lookbehind assertion:
lookbehind(R1,R2)
positive lookbehind assertion
(something preceded by something else.)
?!= (?>!) regex negative lookbehind assertion:
lookbehind(-R1,R2)
(something not preceded by something else.)
?() regex condition if-then: ( R1&R2 ) binary set intersection operator.
?()| regex condition if-then-else: ( R1|R2 ) binary set union operator.
Quantifiers:
* regex zero or more: ( *R ) zero or more matches
*? regex zero or more, ungreedy: unsupported, use lookahead/lookbehind combined with (!R) and (!!R) search operators.
+ regex one or more: ( +R ) one or more matches
+? regex one or more, ungreedy: unsupported, use lookahead/lookbehind combined with (!R) and (!!R) search operators.
? regex zero or one: ( R*1 ) zero to one matches
?? regex zero or one, ungreedy: unsupported, use lookahead/lookbehind combined with (!R) and (!!R) search operators.
{N} regex exactly N matches: ( R+pair(N,npos) ) exactly N matches
{N,M} regex N to M matches: ( R+pair(N,M) ) N to M matches
{3,5}? regex range quantifier, ungreedy: unsupported, use lookahead/lookbehind combined with (!R) and (!!R) search operators.
Character ranges:
. regex dot match:
symbol()
(any character including new line '\n'.)
a|b regex a or b: ( a|b ) binary set union operator
(...) regex backreference (regex named capturing group):
edge(R)
substring expression adaptor
(?:..) regex passive group: use parenthesized C++ sub-expression
[abc] regex character range: begin("abc")
[^abc] regex not character range: -begin("abc")
[a-z] regex letters between a and z:
entity("qwertyuiopasdfghjklzxcvbnm")
[0-7] regex digits between 0 and 7: entity("01234567")
Regex matching modes:
/i regex case insensitive match mode: see case-insensitive string
/x regex ignore whitespace between tokens mode: see predefined space policy classes
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Grammar reference | Table Of Contents | Lexemes >>> |