Concurrency’s Shysters | The Observation Deck

For as long as I’ve been in computing, the subject of concurrency has always induced a kind of thinking man’s hysteria. When I was coming up, the name of the apocalypse was symmetric multiprocessing – and its arrival was to be the Day of Reckoning for software. There seemed to be no end of doomsayers, even among those who putatively had the best understanding of concurrency. (Of note was a famous software engineer who – despite substantial experience in SMP systems at several different computer companies – confidently asserted to me in 1995 that it was “simply impossible” for an SMP kernel to “ever” scale beyond 8 CPUs. Needless to say, several of his past employers have since proved him wrong…)

There also seemed to be no end of concurrency hucksters and shysters, each eager to peddle their own quack cure for the miasma. Of these, the one that stuck in my craw was the two-level scheduling model, whereby many user-level threads are multiplexed on fewer kernel-level (schedulable) entities. (To paraphrase what has been oft said of little-known computer architectures, you haven’t heard of it for a reason.) The rationale for the model – that it allowed for cheaper synchronization and lightweight thread creation – seemed to me at the time to be long on assertions and short on data. So working with my undergraduate advisor, I developed a project to explore this model both quantitatively and dynamically, work that I undertook in the first half of my senior year. And early on in that work, it became clear that – in part due to intractable attributes of the model – the two-level thread scheduling model was delivering deeply suboptimal performance…

Several months after starting the investigation, I came to interview for a job at Sun with Jeff, and he (naturally) asked me to describe my undergraduate work. I wanted to be careful here: Sun was the major proponent of the two-level model, and while I felt that I had the hard data to assert that the model was essentially garbage, I also didn’t want to make a potential employer unnecessarily upset. So I stepped gingerly: “As you may know,” I began, “the two-level threading model is very… intricate.” “Intricate?!” Jeff exclaimed, “I’d say it’s completely busted!” (That moment may have been the moment that I decided to come work with Jeff and for Sun: the fact that an engineer could speak so honestly spoke volumes for both the engineer and the company. And despite Sun’s faults, this engineering integrity remains at Sun’s core to this day – and remains a draw to so many of us who have stayed here through the ups and downs.) With that, the dam had burst: Jeff and I proceeded to gush about how flawed we each thought the model to be – and how dogmatic its presentation. So paradoxically, I ended up getting a job at Sun in part by telling them that their technology was unsound!

Back at school, I completed my thesis. Like much undergraduate work, it’s terribly crude in retrospect – but I stand behind its fundamental conclusion that the unintended consequences of the two-level scheduling model make it essentially impossible to achieve optimal performance. Upon arriving at Sun, I developed an early proof-of-concept of the (much simpler) single-level model. Roger Faulkner did the significant work of productizing this as an alternative threading model in Solaris 8 – and he eliminated the two-level scheduling model entirely in Solaris 9, thus ending the ill-begotten experiment of the two-level scheduling model somewhere shy of its tenth birthday. (Roger gave me the honor of approving his request to integrate this work, an honor that I accepted with gusto.)

So why this meandering walk through a regrettable misadventure in the history of software systems? Because over a decade later, concurrency is still being used to instill panic in the uninformed. This time, it is chip-level multiprocessing (CMP) instead of SMP that promises to be the End of Days – and the shysters have taken a new guise in the form of transactional memory. The proponents of this new magic tonic are in some ways darker than their forebears: it is no longer enough to warn of Judgement Day – they must also conjure up notions of Original Sin to motivate their perverted salvation. “The heart of the problem is, perhaps, that no one really knows how to organize and maintain large systems that rely on locking” admonished Nir Shavit recently in CACM. (Which gives rise to the natural follow-up question: is the Solaris kernel not large, does it not rely on locking or do we not know how to organize and maintain it? Or is that we do not exist at all?) Shavit continues: “Locks are not modular and do not compose, and the association between locks and data is established mostly by convention.” Again, no data, no qualifiers, no study, no rationale, no evidence of experience trying to develop such systems – just a naked assertion used as a prop for a complicated and dubious solution. Are there elements of truth in Shavit’s claims? Of course: one can write sloppy, lock-based programs that become a galactic, unmaintainable mess. But does it mean that such monstrosities are inevitable? No, of course not.

So fine, the problem statement is (deeply) flawed. Does that mean that the solution is invalid? Not necessarily – but experience has taught me to be wary of crooked problem statements. And in this case (perhaps not surprisingly) I take umbrage with the solution as well. Even if one assumes that writing a transaction is conceptually easier than acquiring a lock, and even if one further assumes that transaction-based pathologies like livelock are easier on the brain than lock-based pathologies like deadlock, there remains a fatal flaw with transactional memory: much system software can never be in a transaction because it does not merely operate on memory. That is, system software frequently takes action outside of its own memory, requesting services from software or hardware operating on a disjoint memory (the operating system kernel, an I/O device, a hypervisor, firmware, another process – or any of these on a remote machine). In much system software, the in-memory state that corresponds to these services is protected by a lock – and the manipulation of such state will never be representable in a transaction. So for me at least, transactional memory is an unacceptable solution to a non-problem.

As it turns out, I am not alone in my skepticism. When we on the Editorial Advisory Board of ACM Queue sought to put together an issue on concurrency, the consensus was twofold: to find someone who could provide what we felt was much-needed dissent on TM (and in particular on its most egregious outgrowth, software transactional memory), and to have someone speak from experience on the rise of CMP and what it would mean for practitioners.

For this first article, we were lucky enough to find Calin Cascaval and colleagues, who ended up writing a must-read article on STM in November’s CACM. Their conclusions are unavoidable: STM is a dog. (Or as Cascaval et al. more delicately put it: “Based on our results, we believe that the road for STM is quite challenging.”) Their work is quantitative and analytical and (best of all, in my opinion) the authors never lose sight of the problem that transactional memory was meant to solve: to make parallel programming easier. This is important, because while many of the leaks in the TM vessel can ultimately be patched, the patches themselves add layer upon layer of complexity. Cascaval et al. conclude:

And because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.

And while their language is tighter (and the subject of their work a weightier and more active research topic), the conclusions of Cascaval et al. are eerily similar to my final verdict on the two-level scheduling model, over a decade ago:

The dominating trait of the [two-level] scheduling model is its complexity. Unfortunately, virtually all of its complexity is exported to the programmer. The net result is that programmers must have a complete understanding of the model the inner workings of its implementation in order to be able to successfully tap its strengths and avoid its pitfalls.

So TM advocates: if Roger Faulkner knocks on your software’s door bearing a scythe, you would be well-advised to not let him in…

For the second article, we brainstormed potential authors – but as we dug up nothing but dry holes, I found myself coming to an unescapable conclusion: Jeff and I should write this, if nothing else as a professional service to prevent the latest concurrency hysteria from reaching epidemic proportions. The resulting article appears in full in the September issue of Queue, and substantially excerpted in the November issue of CACM. Writing the article was a gratifying experience, and gave us the opportunity to write down much of what we learned the hard way in the 1990s. In particular, it was cathartic to explore the history of concurrency. Having been at concurrency’s epicenter for nearly a decade, I felt that the rise of CMP has been recently misrepresented as a failure of hardware creativity – and it was vindicating to read CMP’s true origins in the original DEC Piranha paper: that given concurrent databases and operating systems, implementing multiple cores on the die was simply the best way to deliver OLTP performance. That is, it was the success of concurrent software – and not the failure of imagination on the part of hardware designers – that gave rise to the early CMP implementations. Hopefully practitioners will enjoy reading the article as much as we enjoyed writing it – and here’s hoping that we live to see a day when concurrency doesn’t attract so many schemers and dreamers!