Common Text Transformation Library: gumus.cpp

Gumus utility is a processor of a small scripting language to transform prefabricated units of free text into pieces of C++ code of stream output that is capable of reproducing the original text. For example, if file input.txt contains a single line of text,

Multiple lines of of input are converted into multiple stream output statements:

Gumus scripts are simple to write and maintain. They become extremely useful when there is a need to generate computer programs instead of doing error-prone copy-and-paste operations by hand. CTTL lambda library, which is based on multiple template specializations and other similar pieces of repetitive code, contains substantial amount of source generated by gumus scripts. Whenever there is a need to write a series of programs composed of duplicated code, you should consider writing a program to generate such programs. In many cases, gumus script could come as a handy tool to automate mundane programming tasks.

dot C++ code

Besides just text, gumus input file may contain lines of C++ code, marked by dots at the beginning of the line. For example:

After being processed, the dots have been removed from the original input, while the C++ code remained intact. Thus, dots introduce lines of C++ code into gumus output in such way that C++ code can be mixed freely with units of free text inside gumus script. If instead of displaying the output on the screen we save it into a temporary C++ header file,

we can now construct a small driver program that includes the generated header as follows:

When the driver program is compiled and run, it will produce the following output:

<<.expr.>>

The input script also supports output expressions, deliniated as <<.expr.>>. For example, we could modify our input file as follows:

Then, if we rerun our script, recompile the driver program, and run it again, the output will become

Most of the gumus scripts are very simple and use only single variables as opposed to complex expressions inside text output units. In any case, output expressions <<.expr.>> must not contain character or string literals, because after processing by gumus all literals become decorated as escaped characters, for example,

The result can no longer compile as a valid C++ code. To remedy the problem, user could store character in a variable to avoid character literal in output expression:

dot indentation

Multiple dots can be specified for the dotted lines. When processed, the dots are replaced by spaces:

tracking script source

When large sets of files are generated, it is important to keep track of the script origins. The following script adds current time, the name of the file, and the source line number to the output:

To make time functions available, we may need to modify our driver program as follows:

gumus command line parameters

The name of the stream output is important if driver program wants to accumulate generated text in memory, rather than send it to the standard output. This can be very helpful if driver needs to accumulate multiple outputs and save them in different files.

Finally, the driver program main.cpp uses std::stringstream object to capture the output:

gumus source code

// sample code: gumus.cpp
// Gumus script preprocessor utility.
// Usage: specify a gumus source file to convert to C++.
// syntax:
//   text
//   text <<.variable.>> <<.variable.>>...
//   text
// .C++ code
// .//C++ comment, etc.
// Note, that dots specify indentation level for the line output.

//#define NDEBUG    // define before assert.h to stop assertions from being compiled 
//#define CTTL_TRACE_EVERYTHING //define to turn tracing on
//#define CTTL_TRACE_RULES  //define to turn light tracing on
//#define GUMUS_TRACE_VARS  //define to debug gumus script variables

#include <iostream>
#include "cttl/cttl.h"
#include "utils/fileio.h"
#include "utils/itos.h"

using namespace cttl;

struct gumus {
    std::string line_prefix;        // std::cout or str_. To be used as LHS with output operator.
    std::string output_operator;    // either << for std::cout, or += for string output.
    std::string line_suffix;        // std::endl or '\n'. To be used as RHS with output operator.
    int indentation_level;

    gumus(
        std::string const& line_prefix_     /*= "std::cout"*/,
        std::string const& output_operator_ /*= "<<"*/,
        std::string const& line_suffix_     /*= "std::endl;"*/,
        int indentation_level_ = 2
        )
        :
        line_prefix( line_prefix_ ),
        output_operator( output_operator_ ),
        line_suffix( line_suffix_ ),
        indentation_level( indentation_level_ )
    {
    }

    void remove_cr( edge<>& edge_ )
    {
        edge_.push();
        while ( ( !symbol( '\r' ) ).match( edge_ ) != std::string::npos )
        {
            edge_.push();
            edge_.second.offset( edge_.first.offset() );
            edge_.first.offset( edge_.first.offset() - 1 );
            edge_.text( "" );
            edge_.pop();
        }
        edge_.pop();
    }

    size_t match_lines( edge<>& edge_ )
    {
        return (
            +(
                -end()
                +
                quote(
                    true
                    ,
                    (
                        begin( '.' )
                        +
                        CTTL_RULE( gumus::event_code_line )
                    )
                    |
                    (
                        CTTL_RULE( gumus::event_line )
                        &
                        *CTTL_RULE( gumus::find_escape )
                        &
                        *CTTL_RULE( gumus::find_variable )
                    )
                    ,
                    '\n' | end()
                )
            )
        ).match( edge_ );
    }

    size_t event_code_line( edge<>& edge_ )
    {
        for (
            indentation_level = 0
            ;
            ( edge_.first[ indentation_level ] == '.' )
            &&
            ( indentation_level < edge_.length() )
            ;
            ++indentation_level
            )
        {
            edge_.first[ indentation_level ] = ' ';
        }

        return edge_.second.offset();
    }

    size_t event_line( edge<>& edge_ )
    {
        // line decorations
        std::string indentation( indentation_level, ' ' );

        if ( edge_.length() ) {
            // non-empty line
            edge_.first.insert_go( indentation + line_prefix + output_operator + "\"" );
            edge_.second.insert_stay( "\"" + output_operator + line_suffix );
        } else {
            edge_.first.insert_go( indentation + line_prefix + output_operator + line_suffix );
        }

        return edge_.first.offset( edge_.second.offset() );
    }

    size_t find_escape( edge<>& edge_ )
    {
        return (
            (
                first( "\"\'\t\\" )
            )
            &
            CTTL_RULE( gumus::event_escape_char )

        ).find( edge_ );
    }

    size_t event_escape_char( edge<>& edge_ )
    {
        if ( edge_.first[ 0 ] == '\t' )
            edge_.text( "\\t" );
        else
            edge_.first.insert_go( "\\" );
        return edge_.second.offset();
    }

    size_t find_variable( edge<>& edge_ )
    {
        return (
            (
                symbol( "<<." )
                +
                !symbol( ".>>" )
            )
            &
            CTTL_RULE( gumus::event_variable )

        ).find( edge_ );
    }

    size_t event_variable( edge<>& edge_ )
    {
        // left connector
        edge_.first[ 0 ] = output_operator[ 0 ]; //'<' -or- ';'
        edge_.first[ 1 ] = output_operator[ 1 ]; //'<' -or- '+'
        edge_.first[ 2 ] = output_operator[ 2 ]; //' ' -or- '='

        edge_.first.insert_go( "\"" );

        edge_.second[ -3 ] = output_operator[ 0 ]; //'<' -or- ';'
        edge_.second[ -2 ] = output_operator[ 1 ]; //'<' -or- '+'
        edge_.second[ -1 ] = output_operator[ 2 ]; //' ' -or- '='

        edge_.second.insert_go( "\"" );
#ifdef GUMUS_TRACE_VARS
        edge_.second.insert_go( "/*" + edge_.text().substr( 3, edge_.length() - 7 ) + "*/" );
#endif // GUMUS_TRACE_VARS

        return 0;
    }

    bool parse( edge<>& universe_ )
    {
        remove_cr( universe_ );
        assert( universe_.length() == int( universe_.parent().length() ) );
        if ( match_lines( universe_ ) != std::string::npos )
            return true;
        return false;
    }
};

int main(int argc, char* argv[])
{
    std::string line_prefix( "std::cout" );
    std::string output_operator( "<< " );
    std::string line_suffix( "std::endl;" );

    if ( argc == 1 ) {
        std::cout
            << std::endl
            << "Usage: specify a gumus source file to convert to C++:"
            << std::endl
            << std::endl
            << '>' << argv[ 0 ] << " path/file.ext [str_] [end-of-line-suffix]"
            << std::endl
            << std::endl
            << "\t Second argument is optional. If specified, it becomes"
            << std::endl
            << "\t name of the stream output. The default is \'std::cout\'."
            << std::endl
            << std::endl
            << "\t Third argument is optional. Specifies"
            << std::endl
            << "\t end-of-line suffix. The default is \'std::endl;\'."
            << std::endl
            << "\t If second argument given, the default changes to"
            << "\t more general \'\\n\'."
            << std::endl
            ;
        return 1;

    } else if ( argc == 3 ) {
        line_prefix = argv[ 2 ];
        line_suffix = "\'\\n\';";

    } else if ( argc == 4 ) {
        line_prefix = argv[ 2 ];
        line_suffix = argv[ 3 ];
    }

    input<> inp;
    file2string( argv[ 1 ], inp.text() );
    assert( inp.length() );
    edge<> universe( new_edge( inp ) );

    gumus parser( line_prefix, output_operator, line_suffix );
    if ( parser.parse( universe ) ) {
        std::cout << inp.text();
        return 0;
    }

    std::cout << "*** parser failed ***" << std::endl;
    return 1;
}

Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.