Tuesday, June 19, 2007

Fun with ANTLR v3

A colleague of mine puts it best, "I have a habit of chasing shiny objects". I'm interested in so many things in this industry that I love to tinker with that I never actually get to finish any of them, other than a C++ parser I once dreamed of :).

The latest shiny object is the new ANTLR version 3, a fancy new parser generator. I've followed ANTLR with interest for a few years and we almost used it for CDT's parser back in the 1.x days. But at the time we felt that hand coding was the only way we could successfully deal with C++ ambiguous statements (e.g. x * y, is that expression, or a declaration of a pointer to x).

The new version comes with an LL(*) parsing mechanism that uses infinite lookahead (the *) to decide what rules a sentence matches. If you aren't familiar with what all that means, Terence Parr, the author of ANTLR, has written a great book that explains it all for you. The book itself is interesting since right now there is very little documentation other than the book. But it's only $24 dollars for the "non-dead tree" PDF version and is an interesting way to help fund the work.

But, this is essentially how Johnny C and I wrote the CDT parsers. We start at the high level concepts and break them down into finer grained detail until you get to the individual characters, creating an Abstract Syntax Tree (AST) along the way. You can base interpretation choices on where you are in the analysis and by looking ahead into the source stream as much as you need. It's a very powerful technique.

With ANTLR, however, you can specify the grammar at a higher level and have it generate a lot of boilerplate code for you. It may be the one parser generator tool that can convince me that it's better than hand coding. But to figure this out, I decided to try building a parser. Mike and the CDT guys at IBM are already working on a C parser using LPG, an LR parser generator (LR is bottom up, which, while faster, doesn't allow you to easily use context information when resolving rules, I prefer LL) and extending it for UPC, Unified Parallel C. And since I need to resolve some extensibility issues for GNU versus MSVC, and having a lot of experience in the past, I decided to try a C++ parser.

Now if ANTLR can handle that, I'm sold. It'll be an interesting journey and should allow me to try out some ideas on improving how we did some of the things in CDT's parser. Also this is just a prototype and I don't really plan on replacing CDT's parser with it. But it will help me learn ANTLR more and help me help others in the Eclipse community who want to use it. And who knows, another shiny object may fly by and take me on a totally different tangent anyway...

10 comments:

  1. Hey there!

    I just posted an ANTLR 3 plugin for Eclipse. It only generates java, but you could easily take the builder source and tweak it a little to generate C++ code.

    See http://javadude.com/tools/antlr3-eclipse

    I'm gonna post an update tonight, btw.

    -- Scott

    ReplyDelete
  2. BTW, checking Scott's page, he's looking at porting ANTLRworks, an IDE for ANTLR, to Ecilpse. I'm really looking forward to that. I also wonder why in this day and age, given the number of ANTLR users using Eclipse, why this wasn't done in the first place.

    ReplyDelete
  3. I wish I could find a job that paid me full-time to "chase shiny objects"

    :)

    ReplyDelete
  4. Full time? If I could do this full time, maybe I'd actually finish something...

    ReplyDelete
  5. More than a year later.....I ask....how did this go? Did the project fly? if not why not?

    ReplyDelete
  6. Didn't get off the ground. The IBM guys had success with LPG with the help of some backtracking customizations they did. And they are looking at C++ now with eyes on the next standard release C++0x.

    I do know the Fortran guys with the Parallel Tools Project are using ANTLR. I still think it's superior technology, but it does have a bit of a learning curve.

    ReplyDelete
  7. Thanks Doug. We are using LPG for the P8 project in Hursley. We use it to parse the PHP grammar so I was interested to see that you'd thought about moving from LPG to ANTLR.

    ReplyDelete
  8. I haven't moved but the guys working on the parsers have moved ;).

    Unfortunately I don't get time to play in that area anymore. But we have some pretty smart guys that are so I don't really have to either.

    ReplyDelete
  9. Windsor Pvt Ltd - Starter Alternators, Diesel Generators, Sound
    Proff Gen Sets, LPG Generators, Gas Generators, Silent Generator,
    Power Gen Sats Generator. We have wide range of Diesel Generators
    Accessories Manufacturers, LPG Generators Accessories from India.

    Addresses:­
    WINDSOR
    Plot No. 766,
    Pace City II, Sector­37
    Gurgaon­122001
    INDIA
    Tel : 0091 124 4323900 / 924
    Fax : 0091 124 4323999 / 998

    Website:­http://www.powergensets.com/
    Email id:­ powergensets1@gmail.com

    ReplyDelete