So my copy of Beautiful Code showed up last week. Although I am one of the (many) authors and I have thus had access to the entire book online for some time, I do all of my pleasure reading in venues that need the printed page (e.g. the J Church) and have therefore waited for the printed copy to start reading.
Although I have only read the first twelve chapters or so, it’s already clear (and perhaps not at all surprising) that there are starkly different definitions of beauty here: the book’s greatest strength — and, frankly, its greatest weakness — is that the chapters are so incredibly varied. For one chapter, beauty is a small and titilating act of recursion; for the next, it’s that a massive and complicated integrated system could be delivered quickly and cheaply. (I might add that the definition of beauty in my own chapter draws something from both of these poles: that in software, the smallest and most devilish details can affect the system at the largest and most basic levels.)
If one can deal with the fact that the chapters are widely divergent, and that there is not even a token attempt to weave them together into a larger tapestry, this book (at least so far, anyway) is (if nothing else) exceptionally thought provoking; if Oprah were a code cranking propeller head, this would be the ideal choice for her book club.
Now in terms of some of my specific thoughts that have been provoked: as I mentioned, quite a few of my coauthors are enamored with the elegance of recursion. While I confess that I like writing a neatly recursive routine, I also find that I frequently end up having to unroll the recursion when I discover that I must deal with data structures that are bigger than I anticipated — and that my beautiful code is resulting (or can result) in a stack overflow. (Indeed, I spent several unpleasant days last week doing exactly this when I discovered that pathologically bad input could cause blown stacks in some software that I’m working on.)
To take a concrete example, Brian Kernighan has a great chapter in Beautiful Code about some tight, crisp code written by Rob Pike to perform basic globbing. And the code is indeed beautiful. But it’s also (at least in a way) busted: it overflows the stack on some categories of bad input. Admittedly, one is talking about very bad input here — strings that consist of hundreds of thousands of stars in this case — but this highlights exactly the problem I have with recursion: it leaves you with edge conditions that on the one hand really are edge conditions (deeply
pathological input), but with a failure mode (a stack overflow) that’s just too nasty to ignore.
Now, there are ways to deal with this. If one can stomach it, the simplest way to deal with this is to setup a sigaltstack and then siglongjmp out of a SIGSEGV/SIGBUS signal handler. You have to be very careful about doing this: the signal handler should look at the si_addr field in the siginfo and comparing it to the stack bounds to confirm that it’s a stack overflow, lest it end up siglongjmp‘ing out of a non-recursion induced SIGSEGV (which, needless to say, would make a bad problem much worse). While an alternative signal stack solution may sound hideous to some, at least the recursion doesn’t have to go under the knife in this approach. If having a SIGSEGV handler to catch this condition feels uncomfortably brittle (as well it might), or if one’s state cannot be neatly unwound after an arbitrary siglongjmp (as well it might not), the code will have to change: either a depth counter will have to be passed down and failure propagated when depth exceeds a reasonable maximum, or the recursion will have to be unrolled into iteration. For most aesthetic senses, none of these options is going to make the code more beautiful — but they will make it indisputably more correct.
I was actually curious about where exactly the Pike/Kernighan code would blow up, so I threw together a little program that uses sigaltstack along with sigsetjmp/siglongjmp to binary search to find the shortest input that induces the failure. My program, which (naturally) includes the Pike/Kernighan code, is here.
Here are the results of running my program on a variety of Solaris platforms, with each number denoting the maximum string length that can be processed by the Pike/Kernighan code without the possibility of stack overflow.
x86 | SPARC | |||
32-bit | 64-bit | 32-bit | 64-bit | |
Sun cc, unoptimized | 403265 | 187225 | 77649 | 38821 |
gcc, unoptimized | 327651 | 218429 | 69883 | 40315 |
Sun cc, optimized | 327651 | 327645 | 174723 | 95303 |
gcc, optimized | 582489 | 524227 | 149769 | 87367 |
As can be seen, there is a tremendous range here, even across just two different ISAs, two different data models and two different compilers: from 38,821 on 64-bit SPARC using Sun cc without optimization to 582,489 on 32-bit x86 using gcc with optimization — an order of magnitude difference. So while recursion is a beautiful technique, it is one that ends up with the ugliest of implicit dependencies: on the CPU architecture, on the data model and on the compiler. And while recursion is still beautiful to me personally, it will always be a beauty that is more superficial than profound…
22 Responses
This is the principal reason why I always shunned and will continue to shun recursion.
Coming from assembler background, it didn’t take very long for me to recognize the fallacy of recursion — problems can be solved “elegantly”, but, apart from designing a special harness to “babysit” the recursion loop, the whole thing can overflow the stack, which makes it inherently unreliable.
Elegance of code is more than just beauty: it need not necessarily be short/small code, but it should be as simple as possible, because, as I often teach, simple is robust; and only when one has robust building blocks as foundations, can one hope to build more complex robust solutions. Recursion, alas, does not belong into that category because of its conventional implementation.
Finally, I’ll leave with quoting a famous programmer:
“Programming perfection is not when there’s nothing more to add to a program, but when there is nothing left to take away.”
As my computer science teacher in school taught us:
“Use as much recursion as needed and as few as possible”
What about tail call elimination by the compiler? All the programmer needs to do is make sure the recursion is a tail call.
A nice example of this problem with recursion we found in Solaris NFS mount code some years back. rpcgen was used to create the XDR
encoding & decoding functions. When some poor customers crossed some random boundary of about 20,000 (sic) mounts, various related programs started falling over with extremely hard to debug stack overflows.
The solution, of course, was to use manually-written iterative XDR routines. Not much fun either, that 🙂
Rafael, tail call elimination obviously eliminates the stack overflow pathology but (1) it assumes that one has recursion which can be tail-call optimized (which is not true of the Pike/Kernighan code) and (2) it relies on a compiler optimization for correctness. I have been bit personally by relying on the compiler to tail-call optimize and then having situations where it didn’t (for whatever reason). In my experience, one must be very cautious when relying on not just correctness but optimality from a lower layer of software…
There are some peculiarities about Pike’s matching engine. One is that the given algorithm is non-greedy for the star quantifier – that is, a star will try to match as few characters as possible. This is the result of <code>matchstar</code> using a <code>do</code> loop to attempt to match the rest of the regex first before letting the quantifier consume an input character. This is significant insofar as that this non-greedy algorithm requires backtracking.
But if greediness is acceptable, then expressions with a grammar as simple as the one that this matcher accepts can be tested without any backtracking at all! In in this grammar, atoms can only ever consist of single characters, which means the length of an atom is constant. Therefore the regex itself as given is sufficient as a full state machine. (If quantifiers applied to groups of atoms, you would have to compile a state machine from the regex; this is what DFA matchers do. You still don’t need backtracking then, though; it is only actually necessary once you introduce backreferences, which make the expressions non-regular.) So you can write an engine that is driven by the input string, as opposed to having it driven by the regex à la Pike. Such an engine would simply use the position in the regex as its state, and would consumes the input string character by character, checking whether the current character statisfies the current state’s assertion.
This sounds very iterative. Where is the beauty in that?
Well, personally, I don’t find recursion all that beautiful, nor have I found it very natural (in general). This may seem surprising if you consider my propensity for functional programming; but to me, elegance is rather found in treating input as an immutable sequence to be <em>mapped</em> to another immutable sequence. Some other primitives are trivially derivable from this: filtering is simply the act of mapping a sequence to another sequence that omits some of the input elements but is otherwise identical; reducing maps a sequence of arbitrary length to a sequence of length one.
Matching an input against a regex, in these terms, means reducing an input given as a sequence of characters to a boolean.
Note if you write this purely functionally, recursion still enters the picture, because changing state when values are not mutable involves a call. However, this can be written purely tail-recursively.
(Disclaimer: this is just dashed off, so I might have made mixed up some classes of languages and how much computational complexity they entail.)
I agree that the recursion here <em>could</em> blow up, but really, who is going to pass a 300K regular expression?
In engineering, parts have tolerances. The standard for the space shuttle is different from the standard for a bike. We can look at Pike’s code as having a tolerance too.
In an application, I’d probably have a check before the algorithm that disallowed any regular expression larger than say 1024 characters, and told the user that they should create a different one. And, chances are, that check would never fire its action production.
I think we can get ourselves in trouble when we assume that functions have to work across all inputs. And, we’re seduced into that because so much of what we do is like mathematics. But, really, I think that we should think about tolerances. And, tolerances can be different depending upon the type of software we’re writing.
Michael, surely you see the security problem with user input having the ability to blow the stack?
If you overflow your stack, the OS terminates the process. There is no security problem.
Uh….what?
Recursion is almost *always* the right answer. That way generation of programmers won’t be trying to make sense of silly, state-dependent spaghetti.
Try using a language that does recursion right (ie tail call optimization guaranteed), like Scheme.
Try rereading SICP.
Aristoteles: interesting thoughts, and I hope that you will blog both your detailed thoughts on the Pike/Kernighan code, and more generally other thoughts from reading Beautiful Code. What makes the book interesting is that there are too many definitions of “beauty” to (in my opinion, anyway) fit any one person’s definition. That is, for some chapter or chapters, what the author finds beautiful, you will find downright fugly — and I would love to read your rants on what you deem to be the ugly ones.
Michael, as Geoff mentioned, if input is not checked (and I hasten to add that the length-of-death is much lower than 300K — as low as 37K) this is a potential security problem. Now, as I believe Josh is trying to say, this does not constitute a buffer overflow attack — an attacker could not use this vector to execute arbitrary code on the stack. But contrary to Josh’s assertion that there is “no security problem”, this does allow for a potential DoS attack.
That said, to me the issue is less about security and more about failure mode. Michael, while your assertion about tolerances is interesting, the physical world is ultimately a poor analogy for software: this is not physical failure, it is logical failure — and there’s quite a difference. To me, the better analogy is with mathematics: if you prove a theorem for all n less than 38,821, is that proof correct? It depends, of course, on how that proof is labelled: if one asserts that one has proved the theorem for all n when, in fact, it’s only proved for n less than 38,821, than the proof is incorrect. If, however, one accurately claims that one has only proved the theorem for n less than 38,821, then the proof is correct — but it is also clearly a proof of limited consequence, and certainly not one for Erdös’s Book. My argument is that it is better to correctly solve a smaller problem (i.e. limited input) than to incorrectly solve a more ambitious problem (i.e. arbitrary input).
Finally, as for the assertion that we “get ourselves in trouble when we assume that functions have to work across all inputs”, I could not disagree more — indeed, I believe that we get ourselves into trouble when we make implicit assumptions that our input will fit some (ill-defined) notion of “reasonable.” In my experience, software that makes such implicit assumptions becomes especially problematic as its used as foundation software, introducing new, cascading failure modes into the system. Certainly if one fancies writing software that will itself be used by higher layers of software, one must be paranoid about input and explicit about failure modes.
Klein, tail-call optimization won’t do the trick here — the recursion in the Pike/Kernighan code is not a tail-call, and no amount of compiler craftiness can eliminate it. Now, in a stack-based language, presumably much more sophisticated stack management is possible (including much crisper failure modes), but such management would require a lot of nasty trickery if one wanted to both (1) use the stack provided by the machine architecture and (2) provide multiple threads of control. (And considering that using the machine’s notion of the stack is very much necessary for performance, and that a compelling feature of stack-based languages is the ease with which they accommodate threads, it’s a fair assumption that one does indeed want both.)
Bryan, we can go back and forth about whether it’s a logical or a physical problem.. but those are just labels. To me, the crux of this is the fact that the algorithm has certain computational characteristics that place constraints on the environment. If the environment doesn’t meet them, we probably shouldn’t run it.
You’re proof analogy is fine as it stands, but I think we have to question the assumption. Nobody is making the assertion that the algorithm is correct for all N input sequences. We might be able to construct a proof, but that proof would say something about the algorithm on an ideal machine (infinite stack). It wouldn’t say anything about what you can expect in the face of physical (whoops, I said it) resource constraints.
I think that the biggest beef I have with your position is that I feel that there is no single standard for correctness or appropriateness of coding style in the industry. What you do in software that you release to thousands of users can be different from what you do in software you use in a tool for a small workgroup. Safety critical is another area, so is teaching software. As well, not all software is in the middle of an application stack. Not all software is distributed either. Some has trusted users.
FIT is a good example. It’s heavily recursive, and I imagine it could overflow the stack if the size of the tables it processes are too large. I don’t think it’s a serious problem, though. FIT is used in work groups where developers are always present. If it happened, the size of tables would be the obvious culprit. Would I want flight control software to be that trusting? No, but that’s a completely different domain. There is no “one size fits all” standard for code.
In terms of the machine, the stack overflow is not a physical problem — it is (quite literally) a virtual problem. (The problem is that you’ve run out of the VA range allocated for your stack, not that you’ve run out of physical memory.) And while I think there’s only one definition for correctness, I certainly acknowledge that different domains care less about correctness than other factors like the cost of development or the time it takes to bring software to market…
What am I missing, the code you provided looks like it can be tail call optimized.(match, matchstar, matchchere) All of the calls are in the tail position and no instructions are left to compute in the frame after the call.
While I do not have a reference nor the ability to proving it mathematical on the spot, an iterative algorithm can always be rewritten as a tail recursive algorithm.
matchstar’s call to matchhere is not a tail-call — it takes conditional action based on the return from matchhere.
I think a lot of people are missing the point. This is a very
concise and elegant solution to a simple pattern matcher. It will
work on all practical examples. If you want to treat some arbitrary
input from an untrusted user as a match pattern, it’s your own fault
if you don’t verify and place limits on that input first.
A robust regexp library would compile simple patterns like this to a
DFA. Or, more likely, include more features such as alternations
and remembering submatches, and compile some or all patterns to
NFA’s. This will be a lot more complicated. For something as
commonly used as regexps it may be worth having such a full library,
but on small and embedded systems, or for less common tasks for
which there isn’t likely to be a pre-existing library, it’s a great
advantage to be able to quickly put together simple, clean code like
this.
Regarding tail-call optimization, the reason the example has
non-tail calls is because it’s using backtracking, and it needs
somewhere to store that information. The stack is the easiest place
to do this, but you can always move the data into the heap (a
technique Scheme programmers are used to).
Below is a straightforward translation of the same code to use the
heap instead of the stack, and thus never blows out the stack when
compiled with gcc -O2. It’s basically the same thing, except you
pass around an extra “stack” parameter. matchstar is made simpler
by just pushing a backtrack point onto the stack and recursing (and
could just as easily do greedy matching).
For technical reasons, matchhere needs an unused extra “int c”
parameter to force tail-call optimizations. See
http://www.ddj.com/dept/cpp/184401756
for the explanation. This and explicit malloc/free make the code a
lot more clumsy than the equivalent in Scheme, where you have GC and
tail-calls are guaranteed to be optimized.
—
Alex
typedef struct _matchstack {
int c;
char *regexp;
char *text;
struct _matchstack *next;
} *matchstack;
int matchhere(int, char *, char *, matchstack);
int matchstar(int, char *, char *, matchstack);
int matchfail(matchstack);
matchstack matchpushstack(int, char *, char *, matchstack);
int
match(char *regexp, char *text)
{
if (regexp[0] == ‘^’)
return (matchhere(”, regexp + 1, text, NULL));
else
return (matchstar(‘.’, regexp, text, NULL));
}
int
matchhere(int c, char *regexp, char *text, matchstack stack)
{
if (regexp[0] == ”)
return (1);
if (regexp[1] == ‘*’)
return (matchstar(regexp[0], regexp + 2, text, stack));
if (regexp[0] == ‘$’ && regexp[1] == ”)
return (*text == ”);
if (*text != ” && (regexp[0] == ‘.’ || regexp[0] == *text))
return (matchhere(c, regexp + 1, text + 1, stack));
return (0);
}
int
matchstar(int c, char *regexp, char *text, matchstack stack)
{
if (*text != ” && (*text == c || c == ‘.’))
stack = matchpushstack(c, regexp, text+1, stack);
return matchhere(c, regexp, text, stack);
}
int
matchfail(matchstack stack)
{
int c;
char *regexp, *text;
matchstack next;
if (stack) {
c = stack->c;
regexp = stack->regexp;
text = stack->text;
next = stack->next;
free(stack);
return matchstar(c, regexp, text, next);
} else {
return (0);
}
}
matchstack
matchpushstack(int c, char *regexp, char *text, matchstack stack)
{
matchstack newstack = (matchstack) malloc(sizeof(struct _matchstack));
newstack->c = c;
newstack->regexp = regexp;
newstack->text = text;
newstack->next = stack;
return newstack;
}
Alex, I hasten to add that your new code — while fine in some respects — does not check or propagate the potential failure condition from malloc(3C). You may view such error handling as sullying up your code, and that was actually the original point of my blog entry: code that is clean but not robust may be attractive, but to me it will always lack something in terms of beauty.
Bryan,
There’s only a single malloc, which can easily be wrapped in an if (! … ) fatal(“out of memory);.
That one extra line doesn’t exactly sully up the code 🙂 More likely, you’d define a safemalloc
utility which did this automatically, and the
code is then no longer nor less clear.
Sure, in general robust, carefully tested code will be longer and more complex than simple code, but this is completely orthogonal to recursion. Recursion is a useful and powerful tool that can lead to cleaner code – especially if your language gives you a tail-call optimization guarantee.
I just threw up a 5 minute transformation of the code to show how you can use recursion without blowing the stack. It was a quick blog comment and I didn’t realize you were going to hold me accountable to write “industrial-strength” code. At any rate it’s no reason to start attacking me and calling me an amateur.
With no offense intended, it’s not as if making this algorithm iterative is terribly subtle — indeed, as I mentioned in the original post, I have had to unroll my own recursion many times over the years. The point was the failure semantics of the recursion, not the recursion per se — so fixing one failure mode while introducing another very much missed the point I was trying to make. But apologies if you felt that I put too sharp a point on my rebuttal.
So, are relational databases (or any other databases) a failure (“busted”) because they couldn’t handle the amount of data that Google deals with? And so Google had to build their own: BigTable.