Fundamentally, this paper describes a technique for writing recursive-descent parsers in C++ using operator overloading so that the driving logic of the parser is written in a way that looks like the parsed grammar rules. As presented, I think the technique is not usable, but it is not completely flawed and can be made usable.
My first comment is on usefulness. The author states "while C++ is certainly capable of handling recursion, the language is not really capable of handling recursive definitions such as the grammar in its present state". I'm not so sure I understand that sentence correctly as there is no explanation of what is lacking to handle "recursive definitions such as the grammar in its present state". I think that what is hinted at is that applying the given technique to a (directly or not) recursive grammar would produce code with infinite recursion. While some use of recursion can be removed by using closures (which the technique is able to handle by special casing), not all can and I don't think parsing techniques unable to handle recursion in a grammar are particularly useful. I don't know how this problem is removed in functional programming languages (Haskell could perhaps handle it using lazy evaluation, but not Common Lisp). Related to this, the author presents a second grammar as a rewriting of the original one with recursion removed. There is no recursion in the second grammar, but the parsed language is quite different to the one of the first grammar (and I don't think rewriting could remove recursion for that grammar while still parsing the same language).
My second comment is on the error handling, reporting and recovery. There is no explanation of the way errors in the parsed input are supposed to be reported, and as the technique uses backtracking, the naive way is not valid. The code presented just doesn't consume anything on the input in the presence of an error, and nothing is present in the output to indicate that an error occurred. There is also no error recovery strategy, but that's a minor point as I don't know a general parsing technique using backtracking with an error recovery strategy.
My next comment is on performance. The author states that the code presented is not optimised for speed of execution, nor for space, but is not slow. How do you qualify a parser with an exponential behaviour in terms of input length? Indeed, alternation is handled by backtracking and using backtracking without early termination due to error detection can lead to exponential time behaviour even in simple grammars which are parsable using LL(1) parser. A comparatively minor inefficiency is that the remaining input is copied around and in most grammars I know, tail rules consume a bounded (and often quite limited) number of tokens. This alone would give an O(N?) behaviour.
The technique presented is interesting, and can be ameliorated to the point of being usable. For example, an error state could be added to the result. It can be used as a criteria for the alternation operator (if several cases are not errors, that can be considered as an error or handled with the list of results approach given in the paper - this is better that the "use the longest match" rule used first in the paper) and to stop parsing by the sequence operator. With that modification, first, one should be able to handle non left recursive grammars (the Dragon book gives an algorithm to remove any left recursion of a grammar) and second, the exponential behaviour is removed leaving us with the quadratic one.
Removing the quadratic behaviour is easy: instead of copying a list around, use an iterator. After that, I guess that the parser will have a linear time behaviour for LL(k) grammars and be able to handle any non ambiguous grammar or any grammar with the list of results approach.
Now, how to report errors to the user in a useful way? As the method uses backtracking, it is not possible to report the error when it is detected. A possibility is to add an error message and an error position to the error state. In case of error, alternation would build an special error message if the error is that an ambiguity is detected or choose one of the error messages (the one which produced the longest parse before failing is the obvious candidate) of the failing cases when all possibilities fail.
One would still miss an error recovery scheme (that is how to continue to parse after having reported one error so that other errors can be reported), but, as I've never seen one presented for parsing techniques using backtracking, this is a very minor point.
Browsing boost.org I found out that they have a parser framework. From the documentation it looks like a finalized variation on the same themes as Frank's paper taking into account my comments. It surely needs to be looked at by anyone wanting to expand on Frank's paper for a purpose other than experimentation with C++ syntax or parser definition.
Overload Journal #57 - Oct 2003 + Letters to the Editor
|Browse in :||
All > Journal Columns > LettersEditor (127)
Any of these categories - All of these categories