re2c

re2c is a tool for writing very fast and very flexible scanners. Unlike any other such tool, re2c focuses on generating high efficient code for regular expression matching. As a result this allows a much broader range of use than any traditional lexer offers. And Last but not least re2c generates warning free code that is equal to hand-written code in terms of size, speed and quality.

The above made the PHP team to use re2c in various places.

Marcus Börger (helly@users.sourceforge.net)

I very much welcome anyone who would like to contribute to the project, either as a developer with source code access or by simply sending patches, bug reports, or suggestions for improvement.

Dan Nuffer (nuffer@users.sourceforge.net)

Please use the SourceForge facilities to download re2c, report bugs, subscribe to the mailing list, etc.

You can view the manual online here.

re2c is hosted at SourceForge.net

Other re2c links:


Changelog

2014-07-29: 0.13.7.4

  • Enabled 'make docs' only if configured with '--enable-docs'
  • Disallowed to use yacc/byacc instead of bison to build parser
  • Removed non-portable sed feature in script that runs tests
  • 2014-07-27: 0.13.7.3

  • Fixed CXX warning
  • Got rid of asciidoc build-time dependency
  • 2014-07-27: 0.13.7.2

  • Included man page into dist, respect users CXXFLAGS.
  • 2014-07-26: 0.13.7.1

  • Added missing files to tarball
  • 2014-07-25: 0.13.7

    2013-07-04: 0.13.6

    2008-05-25: 0.13.5

    2008-04-05: 0.13.4

    2008-03-14: 0.13.3

    2008-02-14: 0.13.2

    2007-08-24: 0.13.1

    2007-06-24: 0.13.0

    2007-08-24: 0.12.3

    2007-06-26: 0.12.2

    2007-05-23: 0.12.1

    2007-05-01: 0.12.0

    re2c 0.12.0 has been tested with the following compilers:

    2007-04-01: 0.11.3

    2007-03-01: 0.11.2

    2007-02-20: 0.11.1

    2007-01-01: 0.11.0

    2007-04-01: 0.10.8

    2007-02-20: 0.10.7

    2006-08-05: 0.10.6

    2006-06-11: 0.10.5

    2006-06-01: 0.10.4

    2006-05-14: 0.10.3

    2006-05-01: 0.10.2

    re2c 0.10.2 has been tested with the following compilers:

    2006-02-28: 0.10.1

    re2c 0.10.1 has been tested with the following compilers:

    2006-02-18: 0.10.0

    2005-12-28: 0.9.12

    2005-12-18: 0.9.11

    2005-09-04: 0.9.10

    2005-07-21: 0.9.9

    2005-06-26: 0.9.8

    2005-04-30: 0.9.7

    2005-04-14: 0.9.6

    2005-04-08: 0.9.5

    2005-03-12: 0.9.4

    2004-05-26: 0.9.3

    2004-05-26: 0.9.2

    2003-12-13: 0.9.1

    2003-12-09: re2c adopted


    Version 0.9.1 README


    Originally written by Peter Bumbulis (peter@csg.uwaterloo.ca)
    Currently maintained by Brian Young (bayoung@acm.org)

    The re2c distribution can be found at:

    http://www.tildeslash.org/re2c/index.html

    The source distribution is available from:

    http://www.tildeslash.org/re2c/re2c-0.9.1.tar.gz

    This distribution is a cleaned up version of the 0.5 release
    maintained by me (Brian Young). Several bugs were fixed as well
    as code cleanup for warning free compilation. It has been developed
    and tested with egcs 1.0.2 and gcc 2.7.2.3 on Linux x86. Peter
    Bumbulis' original release can be found at:

    ftp://csg.uwaterloo.ca/pub/peter/re2c.0.5.tar.gz

    re2c is a great tool for writing fast and flexible lexers. It has
    served many people well for many years and it deserves to be
    maintained more actively. re2c is on the order of 2-3 times faster
    than a flex based scanner, and its input model is much more
    flexible.

    Patches and requests for features will be entertained. Areas of
    particular interest to me are porting (a Solaris and an NT
    version will be forthcoming) and wide character support. Note
    that the code is already quite portable and should be buildable
    on any platform with minor makefile changes.

    Version 0.5 Peter's original ANNOUNCE and README

    re2c is a tool for generating C-based recognizers from regular
    expressions. re2c-based scanners are efficient: for programming
    languages, given similar specifications, an re2c-based scanner is
    typically almost twice as fast as a flex-based scanner with little or no
    increase in size (possibly a decrease on cisc architectures). Indeed,
    re2c-based scanners are quite competitive with hand-crafted ones.

    Unlike flex, re2c does not generate complete scanners: the user must
    supply some interface code. While this code is not bulky (about 50-100
    lines for a flex-like scanner; see the man page and examples in the
    distribution) careful coding is required for efficiency (and
    correctness). One advantage of this arrangement is that the generated
    code is not tied to any particular input model. For example, re2c
    generated code can be used to scan data from a null-byte terminated
    buffer as illustrated below.

    Given the following source

    #define NULL ((char*) 0)
    char *scan(char *p){
    char *q;
    #define YYCTYPE char
    #define YYCURSOR p
    #define YYLIMIT p
    #define YYMARKER q
    #define YYFILL(n)
    /*!re2c
    [0-9]+ {return YYCURSOR;}
    [\000-\377] {return NULL;}
    */
    }

    re2c will generate

    /* Generated by re2c on Sat Apr 16 11:40:58 1994 */
    #line 1 "simple.re"
    #define NULL ((char*) 0)
    char *scan(char *p){
    char *q;
    #define YYCTYPE char
    #define YYCURSOR p
    #define YYLIMIT p
    #define YYMARKER q
    #define YYFILL(n)
    {
    YYCTYPE yych;
    unsigned int yyaccept;
    goto yy0;
    yy1: ++YYCURSOR;
    yy0:
    if((YYLIMIT - YYCURSOR) < 2) YYFILL(2);
    yych = *YYCURSOR;
    if(yych <= '/') goto yy4;
    if(yych >= ':') goto yy4;
    yy2: yych = *++YYCURSOR;
    goto yy7;
    yy3:
    #line 10
    {return YYCURSOR;}
    yy4: yych = *++YYCURSOR;
    yy5:
    #line 11
    {return NULL;}
    yy6: ++YYCURSOR;
    if(YYLIMIT == YYCURSOR) YYFILL(1);
    yych = *YYCURSOR;
    yy7: if(yych <= '/') goto yy3;
    if(yych <= '9') goto yy6;
    goto yy3;
    }
    #line 12

    }

    Note that most compilers will perform dead-code elimination to remove
    all YYCURSOR, YYLIMIT comparisions.

    re2c was developed for a particular project (constructing a fast REXX
    scanner of all things!) and so while it has some rough edges, it should
    be quite usable. More information about re2c can be found in the
    (admittedly skimpy) man page; the algorithms and heuristics used are
    described in an upcoming LOPLAS article (included in the distribution).
    Probably the best way to find out more about re2c is to try the supplied
    examples. re2c is written in C++, and is currently being developed
    under Linux using gcc 2.5.8.

    Peter

    --

    re2c is distributed with no warranty whatever. The code is certain to
    contain errors. Neither the author nor any contributor takes
    responsibility for any consequences of its use.

    re2c is in the public domain. The data structures and algorithms used
    in re2c are all either taken from documents available to the general
    public or are inventions of the author. Programs generated by re2c may
    be distributed freely. re2c itself may be distributed freely, in source
    or binary, unchanged or modified. Distributors may charge whatever fees
    they can obtain for re2c.

    If you do make use of re2c, or incorporate it into a larger project an
    acknowledgement somewhere (documentation, research report, etc.) would
    be appreciated.

    Please send bug reports and feedback (including suggestions for
    improving the distribution) to

    Include a small example and the banner from parser.y with bug reports.

    peter@csg.uwaterloo.ca