Fishworks – Page 2 – The Observation Deck

The Hunter becomes the Hunted

January 22, 2009
11 Comments

I recently came into a copy of Dave Hitz‘s new book How to Castrate a Bull. A full review is to come, but I couldn’t wait to serve up one delicious bit of irony. Among the book’s many unintentionally fascinating artifacts is NetApp’s original business plan, dated January 16, 1992. In that plan, NetApp’s proposed differentiators are high availability, easy administration, high performance and low price — differentiators that are eerily mirrored by Fishworks’ proposed differentiators nearly fourteen years later. But the irony goes non-linear when Hitz discusses the “Competition” in that original business plan:

Sun Microsystems is the main supplier of NFS file servers. Sun sells over 2/3 of all NFS file servers. Our initial product will be positioned to cost significantly less than Sun’s lower-end server, with performance comparable to their high-end servers.

It is unlikely that Sun will be able to produce a server that performs as well, or costs as little, for several reasons:

Sun’s server hardware is inherently more expensive because it has lower production volumes than our components […]
The culture among software engineers at Sun places little value on performance.
The structure of Sun — with SunSoft doing NFS and UNIX, and SMCC [Sun Microsystems Computer Corporation] doing hardware — makes it difficult for Sun to produce products that provide creative software-hardware system solutions.
Sun’s distribution costs will likely remain high due to the level of technical support required to install and manage a Sun server.

While I had known that NetApp targeted Sun in its early days, I had no idea how explicit that attack had been. Now, it must be said that Hitz was right about Sun on all counts — and that NetApp thoroughly disrupted Sun with its products, ultimately coming to dominate the NAS market itself. But it is stunning the degree to which NetApp’s own business plan — nearly verbatim — is being used against it, not least by the very company that NetApp originally disrupted. (See Slide 5 of the Fishworks elevator pitch — and use your imagination.) Indeed, like NetApp in the 1990s, the Sun Storage 7000 Series is not disruptive by accident, and as I elaborate on in this presentation, we are very deliberately positioning the product to best harness the economic winds blowing so strongly in its favor.

NetApp’s success with their original business plan and our nascent success with Fishworks point to the most important lesson that the history of technology has to teach: economics always wins — a product or a technology or a company ultimately cannot prop up unsustainable economics. Perhaps unlike Hitz, however, I had to learn that lesson the hard way: in the post-bubble meltdown that brought Sun within an inch of its life. But then again, perhaps Hitz has yet to have his final lesson on the subject…

Catching disk latency in the act

December 31, 2008
19 Comments

Today, Brendan made a very interesting discovery about the potential sources of disk latency in the datacenter. Here’s a video we made of Brendan explaining (and demonstrating) his discovery:

This may seem silly, but it’s not farfetched: Brendan actually made this discovery while exploring drive latency that he had seen in a lab machine due to a missing screw on a drive bracket. (!) Brendan has more details on the discovery, demonstrating how he used the Fishworks analytics to understand and visualize it.

If this has piqued your curiosity about the nature of disk mechanics, I encourage you to read Jon Elerath’s excellent ACM Queue article, Hard disk drives: the good, the bad and the ugly! As Jon notes, noise is a known cause of what is called a non-repeatable runout (NRRO) — though it’s unclear if Brendan’s shouting is exactly the kind of noise-induced NRRO that Jon had in mind…

On Modalities and Misadventures

November 16, 2008
13 Comments

Part of the design center of Fishworks is to develop powerful infrastructure that is also easy to use, with the browser as the vector for that usability. So it’s been incredibly vindicating to hear some of the initial reaction to our first product, which uses words like “truly breathtaking” to describe the interface.

But as much as we believe in the browser as system interface, we also recognize that it cannot be the only modality for interacting with the system: there remain too many occasions when one needs the lightweight precision of a command line interface. These occasions may be due to usability concerns (starting a browser can be an unnecessary and unwelcome burden for a simple configuration change), but they may also arise because there is no alternative: one cannot use a browser to troubleshoot a machine that cannot communicate over the network or to automate interaction with the system. For various reasons, I came to be the one working on this problem at Fishworks — and my experiences (and misadventures) in solving it present several interesting object lessons in software engineering.

Before I get to those, a brief aside about our architecture: as will come to no surprise to anyone who has used our applaince and is familiar with browser-based technologies, our interface is AJAX-based. In developing an AJAX-based application, one needs to select a protocol for client-server communication, and for a variety of reasons, we selected XML-RPC — a simple XML-based remote procedure call protocol. XML-RPC has ample client support (client libraries are readily available for JavaScript, Perl, Python, etc.), and it was a snap to write a C-based XML-RPC server. This allowed us to cleanly separate our server logic (the “controller” in MVC parlance) from our client (the “view”), and (more importantly) it allowed us to easily develop an automated test suite to test server-side logic. Now, (very) early in our development I had written a simple Perl script to allow for us to manually test the server-side that I called “aksh” — the appliance kit shell. It provided a simple, captive, command-line interface with features like command history, and it gave developers (that is to say, us) the ability to manually tickle their server-side logic without having to write client-side code.

In part because I had written this primordial shell, the task of writing the command line interface fell to me. And when I first approached the problem it seemed natural to simply extend that little Perl script into something more elaborate. That is, I didn’t stop to think if Perl was still the right tool for what was a much larger job. In not stopping to reconsider, I was committed a classic software engineering blunder: the problem had changed (and in particular, it had grown quite a bit larger than I understood it to be), but I was still thinking in terms of my existing (outmoded) solution.

As I progressed through the project — as my shell surpassed 1,000 lines and made its way towards 10,000 — I was learning of a painful truth about Perl that many others had discovered before me: that it is an undesigned dung heap of a language entirely inappropriate for software in-the-large. As a coping mechanism, I began to vent my frustrations at the language with comments in the source, like this vitriolic gem around exceptions:

eval {
        #
        # We need to install our own private __WARN__ handler that
        # calls die() to be able to catch any non-numeric exception
        # from the below coercion without inducing a tedious error
        # message as a side-effect.  And has it been said recently that
        # Perl is a trash heap of a language?  Indeed, it reminds one
        # of a reeking metropolis like Lagos or Nairobi:  having long
        # since outgrown its original design (such as there was ever
        # any design to begin with), it is now teeming but crippled --
        # sewage and vigilantes run amok.  And yet, still the masses
        # come.	 But not because it is Utopia.	No, they come only
        # because this dystopia is marginally better than the only
        # alternatives that they know...
        #
        local $SIG{'__WARN__'} = sub { die(); };
        $val = substr($value, 0, length($value) - 1) + 0.0;
};

(In an attempt to prevent roaming gangs of glue-huffing Perl-coding teenagers from staging raids on my comments section: I don’t doubt that there’s a better way to do what I was trying to achieve above. But I would counter that there’s also a way to live like a king in Lagos or Nairobi — that doesn’t make it them tourist destinations.)

Most disconcertingly, the further I got into the project, the more the language became an impediment — exactly the wrong trajectory for an evolving software system. And so as I wrote more and more code — and wrestled more and more with the ill-suited environment — the feeling haunting me became unmistakable: this is the wrong path. There’s no worse feeling for a software engineer: knowing that you have made what is likely the wrong decision, but feeling that you can’t revisit that decision because of the time already spent down the wrong path. And so, further down the wrong path you go…

Meanwhile, as I was paying the price of my hasty decison, Eric — always looking for a way to better test our code — was experimenting writing a test harness in which he embedded SpiderMonkey and emulated a DOM layer. These experiments were a complete success: Eric found that embedding SpiderMonkey into a C program was a snap, and the end result allowed us to get automated test coverage over client JavaScript code that previously had to be tested by hand.

Given both Eric’s results and my increasing frustrations with Perl, an answer was becoming clear: I needed to rewrite the appliance shell as a JavaScript/C hybrid, with the bulk of the logic living in JavaScript and system interaction bits living in C. This would allow our two interface modalities (the shell and the web) to commonalize logic, and it would eradicate a complicated and burdensome language from our product. While this seemed like the right direction, I was wary of making another hasty decision. So I started down the new path by writing a library in C that could translate JavaScript objects into an XML-RPC request (and the response back into JavaScript objects). My thinking here was that if the JavaScript approach turned out to be the wrong approach for the shell, we could still use the library in Eric’s new testing harness to allow a wider range of testing. As an aside, this is a software engineering technique that I have learned over the years: when faced with a decision, determine if there are elements that are common to both paths, and implement them first, thereby deferring the decision. In my experience, making the decision after having tackled some of its elements greatly informs the decision — and because the work done was common, no time (or less time) was lost.

In this case, I had the XML-RPC client library rock-solid after about a week of work. The decision could be deferred no longer: it was time to rewrite Perl functionality in JavaScript — time that would indeed be wasted if the JavaScript path was a dead-end. So I decided that I would give myself a week. If, after that week, it wasn’t working out, at least I would know why, and I would be able to return to the squalor of Perl with fewer doubts.

As it turns out, after that week, it was clear that the JavaScript/C hybrid approach was the much better approach — Perl’s death warrant had been signed. And here we come to another lesson that I learned that merits an aside: in the development of DTrace, one regret was we did not start the development of our test suite earlier. We didn’t make that mistake with Fishworks: the first test was created just minutes after the first line of code. With my need to now rewrite the shell, this approach paid an unexpected dividend: because I had written many tests for the old, Perl-based shell, I had a ready-made test suite for my new work. Therefore, contrary to the impressions of some about test-driven development, the presence of tests actually accelerated the development of the new shell tremendously. And of course, once I integrated the new shell, I could say with confidence that it did not contain regressions over the old shell. (Indeed, the only user visible change was that it was faster. Much, much, much faster.)

While it was frustrating to think of the lost time (it ultimately took me six weeks to get the new JavaScript-based shell back to where the old Perl-based shell had been), it was a great relief to know that we had put the right architecture in place. And as often happens when the right software architecture is in place, the further I went down the path of the JavaScript/C hybrid, the more often I had the experience of new, interesting functionality simply falling out. In particular, it became clear that I could easily add a second JavaScript instance to the shell to allow for a scripting environment. This allows users to build full, programmatic flow control into their automation infrastructure without ever having to “screen scrape” output. For example, here’s a script to display the used and available space in each share on the appliance:

script
        run('shares');
        projects = list();
        printf('%-40s %-10s %-10s\n', 'SHARE', 'USED', 'AVAILABLE');

        for (i = 0; i < projects.length; i++) {
                run('select ' + projects[i]);
                shares = list();
                for (j = 0; j < shares.length; j++) {
                        run('select ' + shares[j]);
                        share = projects[i] + '/' + shares[j];
                        used = run('get space_data').split(/\s+/)[3];
                        avail = run('get space_available').split(/\s+/)[3];
                        printf('%-40s %-10s %-10s\n', share, used, avail);
                        run('cd ..');
                }
                run('cd ..');
        }

If you saved the above to file named "space.aksh", you could run it this way:

% ssh root@myappliance < space.aksh
Password:
SHARE                                    USED       AVAILABLE
admin/accounts                           18K        248G
admin/exports                            18K        248G
admin/primary                            18K        248G
admin/traffic                            18K        248G
admin/workflow                           18K        248G
aleventhal/hw_eng                        18K        248G
bcantrill/analytx                        1.00G      248G
bgregg/dashbd                            18K        248G
bgregg/filesys01                         25.5K      100G
bpijewski/access_ctrl                    18K        248G
...

(You can also upload SSH keys to the appliance if you do not wish to be prompted for the password.)

As always, don't take our word for it -- download the appliance and check it out yourself! And if you have the appliance (virtual or otherwise), click on "HELP" and then type "scripting" into the search box to get full documentation on the appliance scripting environment!

Fishworks: Now it can be told

November 10, 2008
31 Comments

In October 2005, longtime partner-in-crime Mike Shapiro and I were taking stock. Along with Adam Leventhal, we had just finished DTrace — and Mike had finished up another substantial body of work in FMA — and we were beginning to wonder about what was next. As we looked at Solaris 10, we saw an incredible building block — the best, we felt, ever made, with revolutionary technologies like ZFS, DTrace, FMA, SMF and so on. But we also saw something lacking: despite being a great foundation, the truth was that the technology wasn’t being used in many essential tasks in information infrastructure, from routing packets to storing blocks to making files available over the network. This last one especially grated: despite having invented network attached storage with NFS in 1983, and despite having the necessary components to efficiently serve files built into the system, and despite having exciting hardware like Thumper and despite having absolutely killer technologies like ZFS and DTrace, Sun had no share — none — of the NAS market.

As we reflected on why this was so — why, despite having so many of the necessary parts Sun had not been able to put together a compelling integrated product — we realized that part of the problem was organizational: if we wanted to go solve this problem, it was clear that we could not do it from the confines of a software organization. With this in mind, we requested a meeting with Greg Papadopoulos, Sun’s CTO, to brainstorm. Greg quickly agreed to a meeting, and Mike and I went to his office to chat. We described the problem that we wanted to solve: integrate Sun’s industry-leading components together and build on them to develop a killer NAS box — one with differentiators only made possible by our technology. Greg listened intently as we made our pitch, and then something unexpected happened — something that tells you a lot about Sun: Greg rose from his chair and exclaimed, “let’s do it!” Mike and I were caught a bit flat-footed; we had expected a much safer, more traditional answer — like “let’s commission a task force!” or something — and instead here was Greg jumping out in front: “Get me a presentation that gives some of the detail of what you want to do, and I’ll talk to Jonathan and Scott about it!”

Back in the hallway, Mike and I looked at each other, still somewhat in disbelief that Greg had been not just receptive, but so explicitly encouraging. Mike said to me exactly what I was thinking: “Well, I guess we’re doing this!”

With that, Mike and I pulled into a nearby conference room, and we sat down with a new focus. This was neither academic exercise nor idle chatter over drinks — we now needed to think about what specifically separated our building blocks from a NAS appliance. With that, we started writing missing technologies on the whiteboard, which soon became crowded with things like browser-based management, clustering, e-mail alerts, reports, integrated fault management, seamless upgrades and rollbacks, and so on. When the whiteboard was full and we took a look at all of it, the light went on: virtually none of this stuff was specific to NAS. At that instant, we realized that the NAS problem was but one example of a larger problem, and that the infrastructure to build fully-integrated, special-purpose systems was itself general-purpose across those special purposes!

We had a clear picture of what we wanted to go do. We put our thoughts into a presentation that we entitled “A Problem, An Opportunity & An Idea” (of which I have made available a redacted version) and sent that to Greg. A week or so later, we had a con-call with Greg, in which he gave us the news from Scott and Jonathan: they bought it. It was time to put together a business plan, build our team and get going.

Now Mike and I went into overdrive. First, we needed a name. I don’t know how long he had been thinking about it, or how it struck him, but Mike said that he was thinking of the name “Fishworks”, it not only being a distinct name that paid homage to a storied engineering tradition (and with an oblique Simpsons reference to boot), but one that also embedded an apt acronym: “FISH”, Mike explained, stood for “fully-integrated software and hardware” — which is exactly what we wanted to go build. I agreed that it captured us perfectly — and Fishworks was born.

We built our team — including Adam, Eric and Keith — and on February 15, 2006, we got to work. Over the next two and a half years, we went through many changes: our team grew to include Brendan, Greg, Cindi, Bill, Dave and Todd; our technological vision expanded as we saw the exciting potential of the flash revolution; and our product scope was broadened through hundreds of conversations with potential customers. But through these changes our fundamental vision remained intact: that we would build a general purpose appliance kit — and that we would use it to build a world-beating NAS appliance. Today, at long last, the first harvest from this long labor is available: the Sun Storage 7110, Sun Storage 7210 and Sun Storage 7410.

It is deeply satisfying to see these products come to market, especially because the differentiators that we so boldly predicted to Sun’s executives so long ago have not only come to fruition, they are also delivering on our promise to set the product apart in the marketplace. Of these, I am especially proud of our DTrace-based appliance analytics. With analytics, we sought to harness the great power of DTrace: its ability to answer ad hoc questions that are phrased in terms of the system’s abstractions instead of its implementation. We saw an acute need for this in network storage, where even market-leading products cannot answer the most basic of questions: “what am I serving and to whom?” The key, of course, was to capture the strength of DTrace visually — and the trick was to give up enough of the arbitrary latitude of DTrace to allow for strictly visual interactions without giving up so much as to unnecessarily limit the power of the facility.

I believe that the result — which you can sample in this screenshot — does more than simply strike the balance: we have come up with ways to visualize and interact with data that actually function as a force multiplier for the underlying instrumentation technology. So not only does analytics bring the power of DTrace to a much broader spectrum of technologists, it also — thanks to the wonders of the visual cortex — has much greater utility than just DTrace alone. (Or, as one hardened veteran of command line interfaces put it to me, “this is the one GUI that I actually want to use!”)

There is much to describe about analytics, and for those interested in a reasonably detailed guided tour of the facility, check out this presentation on analytics that I will be giving later this week at Sun’s Customer Engineering Conference in Las Vegas. While the screenshots in that presentation are illustrative, the power of analytics (like DTrace before it) is in actually seeing it for yourself, in real-time. You can get a flavor for that in this video, in which Mike and I demonstrate and discuss analytics. (That video is part of a larger Inside Fishworks series that takes you through many elements of our team and the product.) While the video is great, it still can’t compare to seeing analytics in your own environment — and for that, you should contact your Sun rep or Sun reseller and arrange to test drive an appliance yourself. Or if you’re the impatient show-me-now kind, download this VMware image that contains a full, working Sun Storage 7000 appliance, with 16 (virtual) disks. Configure the virtual appliance, add a few shares, access them via CIFS, WebDAV, NFS, whatever, and bust out some analytics!

Category: Fishworks

Recent Posts

Archives

Archives