The Observation Deck

Close this search box.

Month: November 2008

Part of the design center of Fishworks is to develop powerful infrastructure that is also easy to use, with the browser as the vector for that usability. So it’s been incredibly vindicating to hear some of the initial reaction to our first product, which uses words like “truly breathtaking” to describe the interface.

But as much as we believe in the browser as system interface, we also recognize that it cannot be the only modality for interacting with the system: there remain too many occasions when one needs the lightweight precision of a command line interface. These occasions may be due to usability concerns (starting a browser can be an unnecessary and unwelcome burden for a simple configuration change), but they may also arise because there is no alternative: one cannot use a browser to troubleshoot a machine that cannot communicate over the network or to automate interaction with the system. For various reasons, I came to be the one working on this problem at Fishworks — and my experiences (and misadventures) in solving it present several interesting object lessons in software engineering.

Before I get to those, a brief aside about our architecture: as will come to no surprise to anyone who has used our applaince and is familiar with browser-based technologies, our interface is AJAX-based. In developing an AJAX-based application, one needs to select a protocol for client-server communication, and for a variety of reasons, we selected XML-RPC — a simple XML-based remote procedure call protocol. XML-RPC has ample client support (client libraries are readily available for JavaScript, Perl, Python, etc.), and it was a snap to write a C-based XML-RPC server. This allowed us to cleanly separate our server logic (the “controller” in MVC parlance) from our client (the “view”), and (more importantly) it allowed us to easily develop an automated test suite to test server-side logic. Now, (very) early in our development I had written a simple Perl script to allow for us to manually test the server-side that I called “aksh” — the appliance kit shell. It provided a simple, captive, command-line interface with features like command history, and it gave developers (that is to say, us) the ability to manually tickle their server-side logic without having to write client-side code.

In part because I had written this primordial shell, the task of writing the command line interface fell to me. And when I first approached the problem it seemed natural to simply extend that little Perl script into something more elaborate. That is, I didn’t stop to think if Perl was still the right tool for what was a much larger job. In not stopping to reconsider, I was committed a classic software engineering blunder: the problem had changed (and in particular, it had grown quite a bit larger than I understood it to be), but I was still thinking in terms of my existing (outmoded) solution.

As I progressed through the project — as my shell surpassed 1,000 lines and made its way towards 10,000 — I was learning of a painful truth about Perl that many others had discovered before me: that it is an undesigned dung heap of a language entirely inappropriate for software in-the-large. As a coping mechanism, I began to vent my frustrations at the language with comments in the source, like this vitriolic gem around exceptions:

eval {
        # We need to install our own private __WARN__ handler that
        # calls die() to be able to catch any non-numeric exception
        # from the below coercion without inducing a tedious error
        # message as a side-effect.  And has it been said recently that
        # Perl is a trash heap of a language?  Indeed, it reminds one
        # of a reeking metropolis like Lagos or Nairobi:  having long
        # since outgrown its original design (such as there was ever
        # any design to begin with), it is now teeming but crippled --
        # sewage and vigilantes run amok.  And yet, still the masses
        # come.	 But not because it is Utopia.	No, they come only
        # because this dystopia is marginally better than the only
        # alternatives that they know...
        local $SIG{'__WARN__'} = sub { die(); };
        $val = substr($value, 0, length($value) - 1) + 0.0;

(In an attempt to prevent roaming gangs of glue-huffing Perl-coding teenagers from staging raids on my comments section: I don’t doubt that there’s a better way to do what I was trying to achieve above. But I would counter that there’s also a way to live like a king in Lagos or Nairobi — that doesn’t make it them tourist destinations.)

Most disconcertingly, the further I got into the project, the more the language became an impediment — exactly the wrong trajectory for an evolving software system. And so as I wrote more and more code — and wrestled more and more with the ill-suited environment — the feeling haunting me became unmistakable: this is the wrong path. There’s no worse feeling for a software engineer: knowing that you have made what is likely the wrong decision, but feeling that you can’t revisit that decision because of the time already spent down the wrong path. And so, further down the wrong path you go…

Meanwhile, as I was paying the price of my hasty decison, Eric — always looking for a way to better test our code — was experimenting writing a test harness in which he embedded SpiderMonkey and emulated a DOM layer. These experiments were a complete success: Eric found that embedding SpiderMonkey into a C program was a snap, and the end result allowed us to get automated test coverage over client JavaScript code that previously had to be tested by hand.

Given both Eric’s results and my increasing frustrations with Perl, an answer was becoming clear: I needed to rewrite the appliance shell as a JavaScript/C hybrid, with the bulk of the logic living in JavaScript and system interaction bits living in C. This would allow our two interface modalities (the shell and the web) to commonalize logic, and it would eradicate a complicated and burdensome language from our product. While this seemed like the right direction, I was wary of making another hasty decision. So I started down the new path by writing a library in C that could translate JavaScript objects into an XML-RPC request (and the response back into JavaScript objects). My thinking here was that if the JavaScript approach turned out to be the wrong approach for the shell, we could still use the library in Eric’s new testing harness to allow a wider range of testing. As an aside, this is a software engineering technique that I have learned over the years: when faced with a decision, determine if there are elements that are common to both paths, and implement them first, thereby deferring the decision. In my experience, making the decision after having tackled some of its elements greatly informs the decision — and because the work done was common, no time (or less time) was lost.

In this case, I had the XML-RPC client library rock-solid after about a week of work. The decision could be deferred no longer: it was time to rewrite Perl functionality in JavaScript — time that would indeed be wasted if the JavaScript path was a dead-end. So I decided that I would give myself a week. If, after that week, it wasn’t working out, at least I would know why, and I would be able to return to the squalor of Perl with fewer doubts.

As it turns out, after that week, it was clear that the JavaScript/C hybrid approach was the much better approach — Perl’s death warrant had been signed. And here we come to another lesson that I learned that merits an aside: in the development of DTrace, one regret was we did not start the development of our test suite earlier. We didn’t make that mistake with Fishworks: the first test was created just minutes after the first line of code. With my need to now rewrite the shell, this approach paid an unexpected dividend: because I had written many tests for the old, Perl-based shell, I had a ready-made test suite for my new work. Therefore, contrary to the impressions of some about test-driven development, the presence of tests actually accelerated the development of the new shell tremendously. And of course, once I integrated the new shell, I could say with confidence that it did not contain regressions over the old shell. (Indeed, the only user visible change was that it was faster. Much, much, much faster.)

While it was frustrating to think of the lost time (it ultimately took me six weeks to get the new JavaScript-based shell back to where the old Perl-based shell had been), it was a great relief to know that we had put the right architecture in place. And as often happens when the right software architecture is in place, the further I went down the path of the JavaScript/C hybrid, the more often I had the experience of new, interesting functionality simply falling out. In particular, it became clear that I could easily add a second JavaScript instance to the shell to allow for a scripting environment. This allows users to build full, programmatic flow control into their automation infrastructure without ever having to “screen scrape” output. For example, here’s a script to display the used and available space in each share on the appliance:

        projects = list();
        printf('%-40s %-10s %-10s\n', 'SHARE', 'USED', 'AVAILABLE');

        for (i = 0; i < projects.length; i++) {
                run('select ' + projects[i]);
                shares = list();
                for (j = 0; j < shares.length; j++) {
                        run('select ' + shares[j]);
                        share = projects[i] + '/' + shares[j];
                        used = run('get space_data').split(/\s+/)[3];
                        avail = run('get space_available').split(/\s+/)[3];
                        printf('%-40s %-10s %-10s\n', share, used, avail);
                        run('cd ..');
                run('cd ..');

If you saved the above to file named "space.aksh", you could run it this way:

% ssh root@myappliance < space.aksh
SHARE                                    USED       AVAILABLE
admin/accounts                           18K        248G
admin/exports                            18K        248G
admin/primary                            18K        248G
admin/traffic                            18K        248G
admin/workflow                           18K        248G
aleventhal/hw_eng                        18K        248G
bcantrill/analytx                        1.00G      248G
bgregg/dashbd                            18K        248G
bgregg/filesys01                         25.5K      100G
bpijewski/access_ctrl                    18K        248G

(You can also upload SSH keys to the appliance if you do not wish to be prompted for the password.)

As always, don't take our word for it -- download the appliance and check it out yourself! And if you have the appliance (virtual or otherwise), click on "HELP" and then type "scripting" into the search box to get full documentation on the appliance scripting environment!

In October 2005, longtime partner-in-crime Mike Shapiro and I were taking stock. Along with Adam Leventhal, we had just finished DTrace — and Mike had finished up another substantial body of work in FMA — and we were beginning to wonder about what was next. As we looked at Solaris 10, we saw an incredible building block — the best, we felt, ever made, with revolutionary technologies like ZFS, DTrace, FMA, SMF and so on. But we also saw something lacking: despite being a great foundation, the truth was that the technology wasn’t being used in many essential tasks in information infrastructure, from routing packets to storing blocks to making files available over the network. This last one especially grated: despite having invented network attached storage with NFS in 1983, and despite having the necessary components to efficiently serve files built into the system, and despite having exciting hardware like Thumper and despite having absolutely killer technologies like ZFS and DTrace, Sun had no share — none — of the NAS market.

As we reflected on why this was so — why, despite having so many of the necessary parts Sun had not been able to put together a compelling integrated product — we realized that part of the problem was organizational: if we wanted to go solve this problem, it was clear that we could not do it from the confines of a software organization. With this in mind, we requested a meeting with Greg Papadopoulos, Sun’s CTO, to brainstorm. Greg quickly agreed to a meeting, and Mike and I went to his office to chat. We described the problem that we wanted to solve: integrate Sun’s industry-leading components together and build on them to develop a killer NAS box — one with differentiators only made possible by our technology. Greg listened intently as we made our pitch, and then something unexpected happened — something that tells you a lot about Sun: Greg rose from his chair and exclaimed, “let’s do it!” Mike and I were caught a bit flat-footed; we had expected a much safer, more traditional answer — like “let’s commission a task force!” or something — and instead here was Greg jumping out in front: “Get me a presentation that gives some of the detail of what you want to do, and I’ll talk to Jonathan and Scott about it!”

Back in the hallway, Mike and I looked at each other, still somewhat in disbelief that Greg had been not just receptive, but so explicitly encouraging. Mike said to me exactly what I was thinking: “Well, I guess we’re doing this!”

With that, Mike and I pulled into a nearby conference room, and we sat down with a new focus. This was neither academic exercise nor idle chatter over drinks — we now needed to think about what specifically separated our building blocks from a NAS appliance. With that, we started writing missing technologies on the whiteboard, which soon became crowded with things like browser-based management, clustering, e-mail alerts, reports, integrated fault management, seamless upgrades and rollbacks, and so on. When the whiteboard was full and we took a look at all of it, the light went on: virtually none of this stuff was specific to NAS. At that instant, we realized that the NAS problem was but one example of a larger problem, and that the infrastructure to build fully-integrated, special-purpose systems was itself general-purpose across those special purposes!

We had a clear picture of what we wanted to go do. We put our thoughts into a presentation that we entitled “A Problem, An Opportunity & An Idea” (of which I have made available a redacted version) and sent that to Greg. A week or so later, we had a con-call with Greg, in which he gave us the news from Scott and Jonathan: they bought it. It was time to put together a business plan, build our team and get going.

Now Mike and I went into overdrive. First, we needed a name. I don’t know how long he had been thinking about it, or how it struck him, but Mike said that he was thinking of the name “Fishworks”, it not only being a distinct name that paid homage to a storied engineering tradition (and with an oblique Simpsons reference to boot), but one that also embedded an apt acronym: “FISH”, Mike explained, stood for “fully-integrated software and hardware” — which is exactly what we wanted to go build. I agreed that it captured us perfectly — and Fishworks was born.

We built our team — including Adam, Eric and Keith — and on February 15, 2006, we got to work. Over the next two and a half years, we went through many changes: our team grew to include Brendan, Greg, Cindi, Bill, Dave and Todd; our technological vision expanded as we saw the exciting potential of the flash revolution; and our product scope was broadened through hundreds of conversations with potential customers. But through these changes our fundamental vision remained intact: that we would build a general purpose appliance kit — and that we would use it to build a world-beating NAS appliance. Today, at long last, the first harvest from this long labor is available: the Sun Storage 7110, Sun Storage 7210 and Sun Storage 7410.

It is deeply satisfying to see these products come to market, especially because the differentiators that we so boldly predicted to Sun’s executives so long ago have not only come to fruition, they are also delivering on our promise to set the product apart in the marketplace. Of these, I am especially proud of our DTrace-based appliance analytics. With analytics, we sought to harness the great power of DTrace: its ability to answer ad hoc questions that are phrased in terms of the system’s abstractions instead of its implementation. We saw an acute need for this in network storage, where even market-leading products cannot answer the most basic of questions: “what am I serving and to whom?” The key, of course, was to capture the strength of DTrace visually — and the trick was to give up enough of the arbitrary latitude of DTrace to allow for strictly visual interactions without giving up so much as to unnecessarily limit the power of the facility.

I believe that the result — which you can sample in this screenshot — does more than simply strike the balance: we have come up with ways to visualize and interact with data that actually function as a force multiplier for the underlying instrumentation technology. So not only does analytics bring the power of DTrace to a much broader spectrum of technologists, it also — thanks to the wonders of the visual cortex — has much greater utility than just DTrace alone. (Or, as one hardened veteran of command line interfaces put it to me, “this is the one GUI that I actually want to use!”)

There is much to describe about analytics, and for those interested in a reasonably detailed guided tour of the facility, check out this presentation on analytics that I will be giving later this week at Sun’s Customer Engineering Conference in Las Vegas. While the screenshots in that presentation are illustrative, the power of analytics (like DTrace before it) is in actually seeing it for yourself, in real-time. You can get a flavor for that in this video, in which Mike and I demonstrate and discuss analytics. (That video is part of a larger Inside Fishworks series that takes you through many elements of our team and the product.) While the video is great, it still can’t compare to seeing analytics in your own environment — and for that, you should contact your Sun rep or Sun reseller and arrange to test drive an appliance yourself. Or if you’re the impatient show-me-now kind, download this VMware image that contains a full, working Sun Storage 7000 appliance, with 16 (virtual) disks. Configure the virtual appliance, add a few shares, access them via CIFS, WebDAV, NFS, whatever, and bust out some analytics!

For as long as I’ve been in computing, the subject of concurrency has always induced a kind of thinking man’s hysteria. When I was coming up, the name of the apocalypse was symmetric multiprocessing — and its arrival was to be the Day of Reckoning for software. There seemed to be no end of doomsayers, even among those who putatively had the best understanding of concurrency. (Of note was a famous software engineer who — despite substantial experience in SMP systems at several different computer companies — confidently asserted to me in 1995 that it was “simply impossible” for an SMP kernel to “ever” scale beyond 8 CPUs. Needless to say, several of his past employers have since proved him wrong…)

There also seemed to be no end of concurrency hucksters and shysters, each eager to peddle their own quack cure for the miasma. Of these, the one that stuck in my craw was the two-level scheduling model, whereby many user-level threads are multiplexed on fewer kernel-level (schedulable) entities. (To paraphrase what has been oft said of little-known computer architectures, you haven’t heard of it for a reason.) The rationale for the model — that it allowed for cheaper synchronization and lightweight thread creation — seemed to me at the time to be long on assertions and short on data. So working with my undergraduate advisor, I developed a project to explore this model both quantitatively and dynamically, work that I undertook in the first half of my senior year. And early on in that work, it became clear that — in part due to intractable attributes of the model — the two-level thread scheduling model was delivering deeply suboptimal performance…

Several months after starting the investigation, I came to interview for a job at Sun with Jeff, and he (naturally) asked me to describe my undergraduate work. I wanted to be careful here: Sun was the major proponent of the two-level model, and while I felt that I had the hard data to assert that the model was essentially garbage, I also didn’t want to make a potential employer unnecessarily upset. So I stepped gingerly: “As you may know,” I began, “the two-level threading model is very… intricate.” “Intricate?!” Jeff exclaimed, “I’d say it’s completely busted!” (That moment may have been the moment that I decided to come work with Jeff and for Sun: the fact that an engineer could speak so honestly spoke volumes for both the engineer and the company. And despite Sun’s faults, this engineering integrity remains at Sun’s core to this day — and remains a draw to so many of us who have stayed here through the ups and downs.) With that, the dam had burst: Jeff and I proceeded to gush about how flawed we each thought the model to be — and how dogmatic its presentation. So paradoxically, I ended up getting a job at Sun in part by telling them that their technology was unsound!

Back at school, I completed my thesis. Like much undergraduate work, it’s terribly crude in retrospect — but I stand behind its fundamental conclusion that the unintended consequences of the two-level scheduling model make it essentially impossible to achieve optimal performance. Upon arriving at Sun, I developed an early proof-of-concept of the (much simpler) single-level model. Roger Faulkner did the significant work of productizing this as an alternative threading model in Solaris 8 — and he eliminated the two-level scheduling model entirely in Solaris 9, thus ending the ill-begotten experiment of the two-level scheduling model somewhere shy of its tenth birthday. (Roger gave me the honor of approving his request to integrate this work, an honor that I accepted with gusto.)

So why this meandering walk through a regrettable misadventure in the history of software systems? Because over a decade later, concurrency is still being used to instill panic in the uninformed. This time, it is chip-level multiprocessing (CMP) instead of SMP that promises to be the End of Days — and the shysters have taken a new guise in the form of transactional memory. The proponents of this new magic tonic are in some ways darker than their forebears: it is no longer enough to warn of Judgement Day — they must also conjure up notions of Original Sin to motivate their perverted salvation. “The heart of the problem is, perhaps, that no one really knows how to organize and maintain large systems that rely on locking” admonished Nir Shavit recently in CACM. (Which gives rise to the natural follow-up question: is the Solaris kernel not large, does it not rely on locking or do we not know how to organize and maintain it? Or is that we do not exist at all?) Shavit continues: “Locks are not modular and do not compose, and the association between locks and data is established mostly by convention.” Again, no data, no qualifiers, no study, no rationale, no evidence of experience trying to develop such systems — just a naked assertion used as a prop for a complicated and dubious solution. Are there elements of truth in Shavit’s claims? Of course: one can write sloppy, lock-based programs that become a galactic, unmaintainable mess. But does it mean that such monstrosities are inevitable? No, of course not.

So fine, the problem statement is (deeply) flawed. Does that mean that the solution is invalid? Not necessarily — but experience has taught me to be wary of crooked problem statements. And in this case (perhaps not surprisingly) I take umbrage with the solution as well. Even if one assumes that writing a transaction is conceptually easier than acquiring a lock, and even if one further assumes that transaction-based pathologies like livelock are easier on the brain than lock-based pathologies like deadlock, there remains a fatal flaw with transactional memory: much system software can never be in a transaction because it does not merely operate on memory. That is, system software frequently takes action outside of its own memory, requesting services from software or hardware operating on a disjoint memory (the operating system kernel, an I/O device, a hypervisor, firmware, another process — or any of these on a remote machine). In much system software, the in-memory state that corresponds to these services is protected by a lock — and the manipulation of such state will never be representable in a transaction. So for me at least, transactional memory is an unacceptable solution to a non-problem.

As it turns out, I am not alone in my skepticism. When we on the Editorial Advisory Board of ACM Queue sought to put together an issue on concurrency, the consensus was twofold: to find someone who could provide what we felt was much-needed dissent on TM (and in particular on its most egregious outgrowth, software transactional memory), and to have someone speak from experience on the rise of CMP and what it would mean for practitioners.

For this first article, we were lucky enough to find Calin Cascaval and colleagues, who ended up writing a must-read article on STM in November’s CACM. Their conclusions are unavoidable: STM is a dog. (Or as Cascaval et al. more delicately put it: “Based on our results, we believe that the road for STM is quite challenging.”) Their work is quantitative and analytical and (best of all, in my opinion) the authors never lose sight of the problem that transactional memory was meant to solve: to make parallel programming easier. This is important, because while many of the leaks in the TM vessel can ultimately be patched, the patches themselves add layer upon layer of complexity. Cascaval et al. conclude:

And because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.

And while their language is tighter (and the subject of their work a weightier and more active research topic), the conclusions of Cascaval et al. are eerily similar to my final verdict on the two-level scheduling model, over a decade ago:

The dominating trait of the [two-level] scheduling model is its complexity. Unfortunately, virtually all of its complexity is exported to the programmer. The net result is that programmers must have a complete understanding of the model the inner workings of its implementation in order to be able to successfully tap its strengths and avoid its pitfalls.

So TM advocates: if Roger Faulkner knocks on your software’s door bearing a scythe, you would be well-advised to not let him in…

For the second article, we brainstormed potential authors — but as we dug up nothing but dry holes, I found myself coming to an unescapable conclusion: Jeff and I should write this, if nothing else as a professional service to prevent the latest concurrency hysteria from reaching epidemic proportions. The resulting article appears in full in the September issue of Queue, and substantially excerpted in the November issue of CACM. Writing the article was a gratifying experience, and gave us the opportunity to write down much of what we learned the hard way in the 1990s. In particular, it was cathartic to explore the history of concurrency. Having been at concurrency’s epicenter for nearly a decade, I felt that the rise of CMP has been recently misrepresented as a failure of hardware creativity — and it was vindicating to read CMP’s true origins in the original DEC Piranha paper: that given concurrent databases and operating systems, implementing multiple cores on the die was simply the best way to deliver OLTP performance. That is, it was the success of concurrent software — and not the failure of imagination on the part of hardware designers — that gave rise to the early CMP implementations. Hopefully practitioners will enjoy reading the article as much as we enjoyed writing it — and here’s hoping that we live to see a day when concurrency doesn’t attract so many schemers and dreamers!

Recent Posts

November 26, 2023
November 18, 2023
November 27, 2022
October 11, 2020
July 31, 2019
December 16, 2018
September 18, 2018
December 21, 2016
September 30, 2016
September 26, 2016
September 13, 2016
July 29, 2016
December 17, 2015
September 16, 2015
January 6, 2015
November 10, 2013
September 3, 2013
June 7, 2012
September 15, 2011
August 15, 2011
March 9, 2011
September 24, 2010
August 11, 2010
July 30, 2010
July 25, 2010
March 10, 2010
November 26, 2009
February 19, 2009
February 2, 2009
November 10, 2008
November 3, 2008
September 3, 2008
July 18, 2008
June 30, 2008
May 31, 2008
March 16, 2008
December 18, 2007
December 5, 2007
November 11, 2007
November 8, 2007
September 6, 2007
August 21, 2007
August 2, 2007
July 11, 2007
May 20, 2007
March 19, 2007
October 12, 2006
August 17, 2006
August 7, 2006
May 1, 2006
December 13, 2005
November 16, 2005
September 13, 2005
September 9, 2005
August 21, 2005
August 16, 2005