As I have discussed before, I strongly believe that to understand systems, you must understand their pathologies — systems are most instructive when they fail. Unfortunately, we in computing systems do not have a strong history of studying pathology: despite the fact that failure in our domain can be every bit as expensive (if not more so) than in traditional engineering domains, our failures do not (usually) involve loss of life or physical property and there is thus little public demand for us to study them — and a tremendous industrial bias for us to forget them as much and as quickly as possible. The result is that our many failures go largely unstudied — and the rich veins of wisdom that these failures generate live on only in oral tradition passed down by the perps (occasionally) and the victims (more often).
A counterexample to this — and one of my favorite systems papers of all time — is Robert Colwell‘s brilliant Performance Effects of Architectural Complexity in the Intel 432. This paper, which dissects the abysmal performance of Intel’s infamous 432, practically drips with wisdom, and is just as relevant today as it was when the paper was originally published nearly twenty years ago.
For those who have never heard of the Intel 432, it was a microprocessor conceived of in the mid-1970s to be the dawn of a new era in computing, incorporating many of the latest notions of the day. But despite its lofty ambitions, the 432 was an unmitigated disaster both from an engineering perspective (the performance was absolutely atrocious) and from a commercial perspective (it did not sell — a fact presumably not unrelated to its terrible performance). To add insult to injury, the 432 became a sort of punching bag for researchers, becoming, as Colwell described, “the favorite target for whatever point a researcher wanted to make.”
But as Colwell et al. reveal, the truth behind the 432 is a little more complicated than trendy ideas gone awry; the microprocessor suffered from not only untested ideas, but also terrible execution. For example, one of the core ideas of the 432 is that it was a capability-based system, implemented with a rich hardware-based object model. This model had many ramifications for the hardware, but it also introduced a dangerous dependency on software: the hardware was implicitly dependent on system software (namely, the compiler) for efficient management of protected object contexts (“environments” in 432 parlance). As it happened, the needed compiler work was not done, and the Ada compiler as delivered was pessimal: every function was implemented in its own environment, meaning that every function was in its own context, and that every function call was therefore a context switch!. As Colwell explains, this software failing was the greatest single inhibitor to performance, costing some 25-35 percent on the benchmarks that he examined.
If the story ended there, the tale of the 432 would be plenty instructive — but the story takes another series of interesting twists: because the object model consumed a bunch of chip real estate (and presumably a proportional amount of brain power and department budget), other (more traditional) microprocessor features were either pruned or eliminated. The mortally wounded features included a data cache (!), an instruction cache (!!) and registers (!!!). Yes, you read correctly: this machine had no data cache, no instruction cache and no registers — it was exclusively memory-memory. And if that weren’t enough to assure awful performance: despite having 200 instructions (and about a zillion addressing modes), the 432 had no notion of immediate values other than 0 or 1. Stunningly, Intel designers believed that 0 and 1 “would cover nearly all the need for constants”, a conclusion that Colwell (generously) describes as “almost certainly in error.” The upshot of these decisions is that you have more code (because you have no immediates) accessing more memory (because you have no registers) that is dog-slow (because you have no data cache) that itself is not cached (because you have no instruction cache). Yee haw!
Colwell’s work builds to crescendo as it methodically takes apart each of these architectural issues — and then attempts to model what the microprocessor would look like were it properly implemented. The conclusion he comes to is the object model — long thought to be the 432’s singular flaw — was only one part of a more complicated picture, and that its performance was “dominated, in large part, by artifacts and not by concepts.” If there’s one imperfection with Colwell’s work, it’s that he doesn’t realize how convincingly he’s made the case that these artifacts were induced by a rigid and foolish adherence to the concepts.
So what is the relevance of Colwell’s paper now, 20 years later? One of the principal problems that Colwell describes is the disconnect between innovation at the hardware and software levels. This disconnect continues to be a theme, and can be seen in current controversies in networking (TOE or no?), in virtualization (just how much microprocessor support do we want/need — and at what price?), and (most clearly, in my opinion) in hardware transactional memory. Indeed, like an apparition from beyond the grave, the Intel 432 story should serve as a chilling warning to those working on transactional memory today: like the 432 object model, hardware transactional memory requires both novel microprocessor architecture and significant new system software. And like the 432 object model, hardware transactional memory has been touted more for its putative programmer productivity than for its potential performance gains. This is not to say that hardware transactional memory is not an appropriate direction for a microprocessor, just that its advocates should not so stubbornly adhere to their novelty that they lose sight of the larger system. To me, that is the lesson of the Intel 432 — and thanks to Colwell’s work, that lesson is available to all who wish to learn it.