<<< Kleene star adaptor ( epsilon parser ) | Lambda Home | Higher-order functions >>> |
Common Text Transformation Library http://cttl.sourceforge.net/
Kleene plus adaptor of the integral lambda expression L, also known as Kleene plus modifier, connects with CTTL grammar expression R by the following operators:
R + +L R & +L R | +L
where
+ is CTTL binary sequence operator;
& is CTTL binary set intersection operator;
and | is CTTL binary set union operator.
Reversed connections are also acceptable:
+L + R +L & R +L | R
Kleene plus adaptor (implemented by overloaded unary plus operator+) interprets the unsigned result Q of the integer lambda expression L as a calculated length of an input symbol that matches a sequence of up to Q characters, starting at the current position of the upper boundary of the parseable substring. The match is shorter, if less than Q characters are available. If parseable substring is empty, the result is epsilon match.
For example,
#include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp = "abc"; const_edge<> substring( inp ); const_edge<> token = substring; token.first = 0; token.second = 0; size_t result = ( token( +scalar( inp.length() ) + end() ) ).match( substring ); assert( result == 0 ); assert( token.text() == "abc" ); return 0; }
Here, the sub-expression
+scalar( inp.length() )
is the Kleene plus adaptor, which matches the entire input sequence.
Kleene plus adaptor of lambda expression L constitutes a direct parser +L, which is governed by the following rules:
If ( Q >= 0 ), the direct parser +L consumes a number of input characters specified by the result Q of the enclosed lambda function.
If ( Q == std::string::npos ), then
(a) The grammar evaluation fails, and the result std::string::npos is returned back to the caller:
#include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp = "abc"; const_edge<> substring( inp ); size_t result = ( entity() & +scalar( std::string::npos ) ).match( substring ); assert( result == std::string::npos ); return 0; }
(b) When evaluation fails, the state of the parseable substring is not modified by the direct parser. However, this is not enforced: if lambda expression L updates the position of parseable substring boundary, the change will persist beyond the evaluation of +L.
Direct parser is ungreedy:
The grammar evaluation succeeds if less than Q characters are available in the parseable substring:
#include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp; // empty input string const_edge<> substring( inp ); const_edge<> token = substring; size_t result = token( begin() + +scalar( 1 ) ).match( substring ); assert( result != std::string::npos ); assert( token == "" ); return 0; }
As shown, direct parser adaptor accepts an empty token as a good match. To exclude empty matches from the results of Kleene plus adaptor, the expression should be decorated by a non-empty match validator:
#include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp; // empty input string const_edge<> substring( inp ); size_t result = ( entity( // exclude empty matches begin() + +scalar( 1 ) ) ).match( substring ); assert( result == std::string::npos ); return 0; }
The space sensitivity of all lambda grammar adaptors is disabled. The space policy is ignored. Regardless of the space policy, grammar evaluation begins immediately at the position specified by the upper boundary of the parseable substring. For example,
#include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp = "a bc"; const_edge< policy_space<> > substring( inp ); const_edge<> token = substring; size_t result = token( symbol() + +scalar( 1 ) ).match( substring ); assert( result != std::string::npos ); assert( substring.first.offset() == 2 ); assert( token == "a " ); return 0; }
The lexeme symbol() matches first character in the input. The next character is matched by sub-expression
+scalar( 1 )
Although automatic space processing is requested when parseable substring is parameterized by policy_space class, the subexpression +scalar(1) nonetheless matches the space character in "a bc".
By design, the space policy processing is disabled for all types of lambda connections with CTTL grammar. The direct parser +scalar(1) utterly ignores the presence of the space policy.
If +scalar(1) is replaced by another symbol() lexeme, the resulting match becomes "a b":
#include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp = "a bc"; const_edge< policy_space<> > substring( inp ); const_edge<> token = substring; size_t result = token( symbol() + symbol() // was +scalar( 1 ) ).match( substring ); assert( result != std::string::npos ); assert( substring.first.offset() == 3 ); assert( token == "a b" ); return 0; }
Because space sensitivity of symbol() lexeme is enabled, the space is skipped, and the second symbol() matches character 'b' in the input "a bc".
Since void regions are part of the space-sensitive substring implementation, the presence of void regions is also ignored by all lambda grammar adaptors:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp = "abc"; policy_space< flag_follow_region > void_region; const_edge< policy_space< flag_follow_region > > substring( inp, void_region ); substring.region_insert(); // entire input is void const_edge<> token = substring; size_t result = ( token( +scalar( inp.length() ) + end() ) ).match( substring ); assert( result == 0 ); assert( token.text() == "abc" ); return 0; }
Kleene plus adaptor and C++ unary plus operator+ share the same notational symbol, the plus sign '+'. By meaning, C++ arithmetic unary plus, applied to a lambda expression, preempts lambda Kleene plus. In other words, overloaded C++ arithmetic unary plus operator gets higher precedence when used together with a Kleene plus operator:
+ +lambda_expr // | | // | `-- C++ arithmetic unary plus // `---- Kleene plus
Such interpretation is natural, since arithmetic plus performs an aritmetic operation on its operand expression lambda_expr, and then Kleene plus operator adapts the result of (+lambda_expr).
The following example instantiates lambda stack primitive and pushes two values on the stack. When arithmetic unary plus operator is applied to the stack primitive, the expression returns the stack size, which is 2 in this case.
Kleene plus adaptor converts integral value of 2 into a direct parser. The parser consumes two characters from the input sequence.
The direct parser is connected to CTTL grammar expression by the sequence operator:
#define CTTL_TRACE_EVERYTHING #include "cttl/cttl.h" #include "lambda/lambda.h" using namespace cttl; int main(/*int argc, char* argv[]*/) { std::string inp = "abcd"; const_edge<> substring( inp ); const_edge<> token = substring; cttl::lambda<>::stack Stack; // push values 11 and 22 on the stack: ( Stack = scalar(11)^scalar(22) ).evaluate(); size_t result = token( symbol() + // CTTL sequence operator + // kleene plus (direct parser) + // Arithmetic unary plus (returns stack size) Stack.make_reference() ).match( substring ); assert( result != std::string::npos ); assert( token == "abc" ); return 0; }
The expression Stack.make_reference() yields reference to the stack. If make_reference() wasn't called, a copy of the stack would be used in the grammar expression.
To avoid a mix-up between multiple plus signs, the grammar expression can be completely parenthesized:
symbol() + // CTTL sequence operator ( + // kleene plus (direct parser) ( // Arithmetic unary plus (returns stack size) +Stack.make_reference() ) )
Copyright © 1997-2009 Igor Kholodov mailto:cttl@users.sourceforge.net.
Permission to copy and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.
<<< Kleene star adaptor ( epsilon parser ) | Lambda Home | Higher-order functions >>> |