July 2008 – The Observation Deck

DTrace and the Palisades Interstate Parkway

July 29, 2008
4 Comments

In general, I don’t believe in drawing attention to bugs in the software of others: any significant body of software is likely to have bugs, and I think one can too easily draw overly broad inferences by looking at software through the lens of its defects (a pathology that I have previously discussed at some length). However — and as you might imagine from the preamble — I’m about to make an exception to that gentlemanly rule…

I, along with zillions of others, read the breathless hype about the new would-be Google slayer, cuil. When a new search engine pops up, my egotistical reflex is to first search for “dtrace”, and the results of searching for “dtrace” on cuil were very, um, interesting. The search results themselves were fine; more creative were the images that cuil decided to associate with them. If you look at that screenshot, you will be able to find an image of a quilt, a strip mall, what could pass for a program from an August Wilson play, and — strangest of all — a sign that reads “Welcome to Palisades Interstate Parkway“. I can’t say for certain that I’ve never travelled on the Palisades Interstate Parkway, but with all deference to that 38.25-mile stretch of tarmac, I do believe that I can say that it played no role in DTrace — or DTrace in it. Indeed, I can say with absolute confidence that searching for “palisades interstate parkway dtrace” will, in short order, yield only this blog entry — provided, that is, that one doesn’t perform said search on cuil… 😉

Revisiting the Intel 432

July 18, 2008
5 Comments

As I have discussed before, I strongly believe that to understand systems, you must understand their pathologies — systems are most instructive when they fail. Unfortunately, we in computing systems do not have a strong history of studying pathology: despite the fact that failure in our domain can be every bit as expensive (if not more so) than in traditional engineering domains, our failures do not (usually) involve loss of life or physical property and there is thus little public demand for us to study them — and a tremendous industrial bias for us to forget them as much and as quickly as possible. The result is that our many failures go largely unstudied — and the rich veins of wisdom that these failures generate live on only in oral tradition passed down by the perps (occasionally) and the victims (more often).

A counterexample to this — and one of my favorite systems papers of all time — is Robert Colwell‘s brilliant Performance Effects of Architectural Complexity in the Intel 432. This paper, which dissects the abysmal performance of Intel’s infamous 432, practically drips with wisdom, and is just as relevant today as it was when the paper was originally published nearly twenty years ago.

For those who have never heard of the Intel 432, it was a microprocessor conceived of in the mid-1970s to be the dawn of a new era in computing, incorporating many of the latest notions of the day. But despite its lofty ambitions, the 432 was an unmitigated disaster both from an engineering perspective (the performance was absolutely atrocious) and from a commercial perspective (it did not sell — a fact presumably not unrelated to its terrible performance). To add insult to injury, the 432 became a sort of punching bag for researchers, becoming, as Colwell described, “the favorite target for whatever point a researcher wanted to make.”

But as Colwell et al. reveal, the truth behind the 432 is a little more complicated than trendy ideas gone awry; the microprocessor suffered from not only untested ideas, but also terrible execution. For example, one of the core ideas of the 432 is that it was a capability-based system, implemented with a rich hardware-based object model. This model had many ramifications for the hardware, but it also introduced a dangerous dependency on software: the hardware was implicitly dependent on system software (namely, the compiler) for efficient management of protected object contexts (“environments” in 432 parlance). As it happened, the needed compiler work was not done, and the Ada compiler as delivered was pessimal: every function was implemented in its own environment, meaning that every function was in its own context, and that every function call was therefore a context switch!. As Colwell explains, this software failing was the greatest single inhibitor to performance, costing some 25-35 percent on the benchmarks that he examined.

If the story ended there, the tale of the 432 would be plenty instructive — but the story takes another series of interesting twists: because the object model consumed a bunch of chip real estate (and presumably a proportional amount of brain power and department budget), other (more traditional) microprocessor features were either pruned or eliminated. The mortally wounded features included a data cache (!), an instruction cache (!!) and registers (!!!). Yes, you read correctly: this machine had no data cache, no instruction cache and no registers — it was exclusively memory-memory. And if that weren’t enough to assure awful performance: despite having 200 instructions (and about a zillion addressing modes), the 432 had no notion of immediate values other than 0 or 1. Stunningly, Intel designers believed that 0 and 1 “would cover nearly all the need for constants”, a conclusion that Colwell (generously) describes as “almost certainly in error.” The upshot of these decisions is that you have more code (because you have no immediates) accessing more memory (because you have no registers) that is dog-slow (because you have no data cache) that itself is not cached (because you have no instruction cache). Yee haw!

Colwell’s work builds to crescendo as it methodically takes apart each of these architectural issues — and then attempts to model what the microprocessor would look like were it properly implemented. The conclusion he comes to is the object model — long thought to be the 432’s singular flaw — was only one part of a more complicated picture, and that its performance was “dominated, in large part, by artifacts and not by concepts.” If there’s one imperfection with Colwell’s work, it’s that he doesn’t realize how convincingly he’s made the case that these artifacts were induced by a rigid and foolish adherence to the concepts.

So what is the relevance of Colwell’s paper now, 20 years later? One of the principal problems that Colwell describes is the disconnect between innovation at the hardware and software levels. This disconnect continues to be a theme, and can be seen in current controversies in networking (TOE or no?), in virtualization (just how much microprocessor support do we want/need — and at what price?), and (most clearly, in my opinion) in hardware transactional memory. Indeed, like an apparition from beyond the grave, the Intel 432 story should serve as a chilling warning to those working on transactional memory today: like the 432 object model, hardware transactional memory requires both novel microprocessor architecture and significant new system software. And like the 432 object model, hardware transactional memory has been touted more for its putative programmer productivity than for its potential performance gains. This is not to say that hardware transactional memory is not an appropriate direction for a microprocessor, just that its advocates should not so stubbornly adhere to their novelty that they lose sight of the larger system. To me, that is the lesson of the Intel 432 — and thanks to Colwell’s work, that lesson is available to all who wish to learn it.

Month: July 2008

Recent Posts

Archives

Archives