The Observation Deck – Page 5 – Views on software from Bryan Cantrill's deck chair

Unikernels are unfit for production

January 22, 2016
No Comments

Recently, I made the mistake of rhetorically asking if I needed to spell out why unikernels are unfit for production. The response was overwhelming: whether people feel that unikernels are wrong-headed and are looking for supporting detail or are unikernel proponents and want to know what the counter-arguments could possibly be, there is clearly a desire to hear the arguments against running unikernels in production.

So, what’s the problem with unikernels? Let’s get a definition first: a unikernel is an application that runs entirely in the microprocessor’s privileged mode. (The exact nomenclature varies; on x86 this would be running at Ring 0.) That is, in a unikernel there is no application at all in a traditional sense; instead, application functionality has been pulled into the operating system kernel. (The idea that there is “no OS” serves to mislead; it is not that there isn’t an operating system but rather that the application has taken on the hardware-interfacing responsibilities of the operating system — it is “all OS”, if a crude and anemic one.) Before we discuss the challenges with this, it’s worth first exploring the motivations for unikernels — if only because they are so thin…

The primary reason to implement functionality in the operating system kernel is for performance: by avoiding a context switch across the user-kernel boundary, operations that rely upon transit across that boundary can be made faster. In the case of unikernels, these arguments are specious on their face: between the complexity of modern platform runtimes and the performance of modern microprocessors, one does not typically find that applications are limited by user-kernel context switches. And as shaky as they may be, these arguments are further undermined by the fact that unikernels very much rely on hardware virtualization to achieve any multi-tenancy whatsoever. As I have expanded on in the past, virtualizing at the hardware layer carries with it an inexorable performance tax: by having the system that can actually see the hardware (namely, the hypervisor) isolated from the system that can actually see the app (the guest operating system) efficiencies are lost with respect to hardware utilization (e.g., of DRAM, NICs, CPUs, I/O) that no amount of willpower and brute force can make up. But it’s not worth dwelling on performance too much; let’s just say that the performance arguments to be made in favor of unikernels have some well-grounded counter-arguments and move on.

The other reason given by unikernel proponents is that unikernels are “more secure”, but it’s unclear what the intellectual foundation for this argument actually is. Yes, unikernels often run less software (and thus may have less attack surface) — but there is nothing about unikernels in principle that leads to less software. And yes, unikernels often run new or different software (and are therefore not vulnerable to the OpenSSL vuln-of-the-week) but this security-through-obscurity argument could be made for running any new, abstruse system. The security arguments also seem to whistle past the protection boundary that unikernels very much depend on: the protection boundary between guest OS’s afforded by the underlying hypervisor. Hypervisor vulnerabilities emphatically exist; one cannot play up Linux kernel vulnerabilities as a silent menace while simultaneously dismissing hypervisor vulnerabilities as imaginary. To the contrary, by depriving application developers of the tools of a user protection boundary, the principle of least privilege is violated: any vulnerability in an application tautologically roots the unikernel. In the world of container-based deployment, this takes a thorny problem — secret management — and makes it much nastier (and with much higher stakes). At best, unikernels amount to security theater, and at worst, a security nightmare.

The final reason often given by proponents of unikernels is that they are small — but again, there is nothing tautologically small about unikernels! Speaking personally, I have done kernel implementation on small kernels and big ones; you can certainly have lean systems without resorting to the equivalent of a gastric bypass with a unikernel! (I am personally a huge fan of Alpine Linux as a very lean user-land substrate for Linux apps and/or Docker containers.) And to the degree that unikernels don’t contain much code, it seems more by infancy (and, for the moment, irrelevancy) than by design. But it would be a mistake to measure the size of a unikernel only in terms of its code, and here too unikernel proponents ignore the details of the larger system: because a unikernel runs as a guest operating system, the DRAM allocated by the hypervisor for that guest is consumed in its entirety — even if the app itself isn’t making use of it. Because running out of memory remains one of the most pernicious of application failure modes (especially in dynamic environments), memory sizing tends to be overengineered in that requirements are often blindly doubled or otherwise slopped up. In the unikernel model, any such slop is lost — nothing else can use it because the hypervisor doesn’t know that it isn’t, in fact, in use. (This is in stark contrast to containers in which memory that isn’t used by applications is available to be used by other containers, or by the system itself.) So here again, the argument for unikernels becomes much more nuanced (if not rejected entirely) when the entire system is considered.

So those are the reasons for unikernels: perhaps performance, a little security theater, and a software crash diet. As tepid as they are, these reasons constitute the end of the good news from unikernels. Everything else from here on out is bad news: costs that must be borne to get to those advantages, however flimsy.

The disadvantages of unikernels start with the mechanics of an application itself. When the operating system boundary is obliterated, one may have eliminated the interface for an application to interact with the real world of the network or persistent storage — but one certainly hasn’t forsaken the need for such an interace! Some unikernels (like OSv and Rumprun) take the approach of implementing a “POSIX-like” interface to minimize disruption to applications. Good news: apps kinda work! Bad news: did we mention that they need to be ported? And here’s hoping that your app’s “POSIX-likeness” doesn’t extend to fusty old notions like creating a process: there are no processes in unikernels, so if your app depends on this (ubiquitous, four-decades-old) construct, you’re basically hosed. (Or worse than hosed.)

If this approach seems fringe, things get much further afield with language-specific unikernels like MirageOS that deeply embed a particular language runtime. On the one hand, allowing implementation only in a type-safe language allows for some of the acute reliability problems of unikernels to be circumvented. On the other hand, hope everything you need is in OCaml!

So there are some issues getting your app to work, but let’s say you’re past all this: either the POSIX surface exposed by your unikernel of choice is sufficient for your app (or platform), or it’s already written in OCaml or Erlang or Haskell or whatever. Should you have apps that can be unikernel-borne, you arrive at the most profound reason that unikernels are unfit for production — and the reason that (to me, anyway) strikes unikernels through the heart when it comes to deploying anything real in production: Unikernels are entirely undebuggable. There are no processes, so of course there is no ps, no htop, no strace — but there is also no netstat, no tcpdump, no ping! And these are just the crude, decades-old tools. There is certainly nothing modern like DTrace or MDB. From a debugging perspective, to say this is primitive understates it: this isn’t paleolithic — it is precambrian. As one who has spent my career developing production systems and the tooling to debug them, I find the implicit denial of debugging production systems to be galling, and symptomatic of a deeper malaise among unikernel proponents: total lack of operational empathy. Production problems are simply hand-waved away — services are just to be restarted when they misbehave. This attitude — even when merely implied — is infuriating to anyone who has ever been responsible for operating a system. (And lest you think I’m an outlier on this issue, listen to the applause in my DockerCon 2015 talk after I emphasized the need to debug systems rather than restart them.) And if it needs to be said, this attitude is angering because it is wrong: if a production app starts to misbehave because of a non-fatal condition like (say) listen drops, restarting the app is inducing disruption at the worst possible time (namely, when under high load) and doesn’t drive at all towards the root cause of the problem (an insufficient backlog).

Now, could one implement production debugging tooling in unikernels? In a word, no: debugging tooling very often crosses the user-kernel boundary, and is most effective when leveraging the ad hoc queries that the command line provides. The organs that provide this kind of functionality have been deliberately removed from unikernels in the name of weight loss; any unikernel that provides sufficiently sophisticated debugging tooling to be used in production would be violating its own dogma. Unikernels are unfit for production not merely as implemented but as conceived: they cannot be understood when they misbehave in production — and by their own assertions, they never will be able to be.

All of this said, I do find some common ground with proponents of unikernels: I agree that the container revolution demands a much leaner, more secure and more efficient run-time than a shared Linux guest OS running on virtual hardware — and at Joyent, our focus over the past few years has been delivering exactly that with SmartOS and Triton. While we see a similar problem as unikernel proponents, our approach is fundamentally different: instead of giving up on the notion of secure containers running on a multi-tenant substrate, we took the already-secure substrate of zones and added to it the ability to natively execute Linux binaries. That is, we chose to leverage advances in operating systems rather than deny their existence, bringing to Linux and Docker not only secure on-the-metal containers, but also critical advances like ZFS, Crossbow and (yes) DTrace. This merits a final reemphasis: our focus on production systems is reflected in everything we do, but most especially in our extensive tooling for debugging production systems — and by bringing this tooling to the larger world of Linux containers, Triton has already allowed for production debugging that we never before would have thought possible!

In the fullness of time, I think that unikernels will be most productive as a negative result: they will primarily serve to demonstrate the impracticality of their approach for production systems. As such, they will join transactional memory and the M-to-N scheduling model as breathless systems software fads that fell victim to the merciless details of reality. But you needn’t take my word for it: as I intimated in my tweet, undebuggable production systems are their own punishment — just kindly inflict them upon yourself and not the rest of us!

Bringing clarity to containers

December 17, 2015
No Comments

At the beginning of the year, I laid down a few predictions. While I refuse on principle to engage in Stephen O’Grady-style self-flagellation, I do think it’s worth revisiting the headliner prediction, namely that 2015 is the year of the container. I said at the time that it wasn’t particularly controversial, and I don’t think it’s controversial now: 2015 was the year of the container, and one need look no further than the explosion of container conferences with container camps and container summits and container cons.

My second prediction was marginally more subtle: that the impedence mismatch between containers in development and containers in production would be a wellspring of innovation. If anything, this understated the case: the wellspring turned out to be more like an open sluice, and 2015 saw the world flooded with multiple ways of doing seemingly everything when it comes to containers. That all of these technologies and frameworks are open source have served to accelerate them all, and mutations abound (Hypernetes, anyone?).

On the one hand this is great, as we all benefit by so many people exploring so many different ideas. But on the other hand, the flurry of choice can become a blizzard of confusion — especially when and where there is seemingly overlap between two technologies. (Or worse, when two overlapping and opinionated technologies disagree ardently on those opinions!) This slide from Karl Isenberg of Mesosphere at KubeCon last month captured it; the point is neither the specific technologies (as Karl noted, plenty are missing) and nor is it about the specific layers (many would likely quibble with some of the details of Karl’s taxonomy) but rather about the explosion of abstraction (and concomitant confusion) in this domain.

One of the biggest challenges that we have in containers heading into 2016 is that this confusion now presents significant head winds for early-adopters and second-movers alike. This has become so acute that I posed a question to KubeCon attendees: are we at or near Peak Confusion in the container space? The conclusion among everyone I spoke with (vendors, developers, operators and others) was that we’re nowhere near Peak Confusion — with many even saying that confusion is still accelerating. (!) Even for those of us who have been in containers for years, this has been a little terrifying — and I can imagine for those entirely new to containers, it’s downright paralyzing.

So, what’s to be done? I think much of the responsibility lies with the industry: instead of viewing containers as new territory for conquest, we must take it upon ourselves to assure for users an interoperable and composable future — one in which technologies can differentiate themselves based on the qualities of their implementation rather than the voraciousness of their appetite. Lest this sound utopian, it is this same ethos that underlies our modern internet, as facilitated by the essential work of the Internet Engineering Task Force (IETF). Thanks to the IETF and its ethos of “rough consensus and running code” we ended up with the interoperable internet. (Indeed, this text itself was brought to you by RFC 791, RFC 793, RFC 1034, and RFC 2616 — among many, many others.)

As for an entity that can potentially serve an IETF-like role for container-based computing, I look with guarded optimism to today’s lauch of the Cloud-native Computing Foundation. Joyent has been involved with the CNCF since its inception, and based on what we’ve seen so far, we see great promise for it in 2016 and beyond. We believe that by elucidating component boundaries and by fostering open source projects that share the values of interoperability and composability, the CNCF can combine the best attributes of both the IETF and the Apache Foundation: rough consensus and running, open source software that allows elastic, container-deployed, service-oriented infrastructure. If the CNCF can do this it will (we believe) serve a vital mission for practitioners: displace confusion with clarity — and therefore accelerate our collective cloud-native future!

Requests for discussion

September 16, 2015
One Comment

One of the exciting challenges of being an all open source company is figuring out how to get design conversations out of the lunch time discussion and the private IRC/Jabber/Slack channels and into the broader community. There are many different approaches to this, and the most obvious one is to simply use whatever is used for issue tracking. Issue trackers don’t really fit the job, however: they don’t allow for threading; they don’t really allow for holistic discussion; they’re not easily connected with a single artifact in the repository, etc. In short, even on projects with modest activity, using issue tracking for design discussions causes the design discussions to be drowned out by the defects of the day — and on projects with more intense activity, it’s total mayhem.

So if issue tracking doesn’t fit, what’s the right way to have an open source design discussion? Back in the day at Sun, we had the Software Development Framework (SDF), which was a decidedly mixed bag. While it was putatively shrink-to-fit, in practice it felt too much like a bureaucratic hurdle with concomitant committees and votes and so on — and it rarely yielded productive design discussion. That said, we did like the artifacts that it produced, and even today in the illumos community we find that we go back to the Platform Software Architecture Review Committee (PSARC) archives to understand why things were done a particular way. (If you’re looking for some PSARC greatest hits, check out PSARC 2002/174 on zones, PSARC 2002/188 on least privilege or PSARC 2005/471 on branded zones.)

In my experience, the best part of the SDF was also the most elemental: it forced things to be written down in a forum reserved for architectural discussions, which alone forced some basic clarity on what was being built and why. At Joyent, we have wanted to capture this best element of the SDF without crippling ourselves with process — and in particular, we have wanted to allow engineers to write down their thinking while it is still nascent, such that it can be discussed when there is still time to meaningfully change it! This thinking, as it turns out, is remarkably close to the original design intent of the IETF’s Request for Comments, as expressed in RFC 3:

The content of a note may be any thought, suggestion, etc. related to the software or other aspect of the network. Notes are encouraged to be timely rather than polished. Philosophical positions without examples or other specifics, specific suggestions or implementation techniques without introductory or background explication, and explicit questions without any attempted answers are all acceptable. The minimum length for a note is one sentence.

These standards (or lack of them) are stated explicitly for two reasons. First, there is a tendency to view a written statement as ipso facto authoritative, and we hope to promote the exchange and discussion of considerably less than authoritative ideas. Second, there is a natural hesitancy to publish something unpolished, and we hope to ease this inhibition.

We aren’t the only ones to be inspired by the IETF’s venerable RFCs, and the language communities in particular seem to be good at this: Java has Java Specification Requests, Python has Python Enhancement Proposals, Perl has the (oddly named) Perl 6 apocalypses, and Rust has Rust RFCs. But the other systems software communities have been nowhere near as structured about their design discussions, and you are hard-pressed to find similar constructs for operating systems, databases, container management systems, etc.

Encouraged by what we’ve seen by the language communities, we wanted to introduce RFCs for the open source system software that we lead — but because we deal so frequently with RFCs in the IETF context, we wanted to avoid the term “RFC” itself: IETF RFCs tend to be much more formalized than the original spirit, and tend to describe an agreed-upon protocol rather than nascent ideas. So to avoid confusion with RFCs while still capturing some of what they were trying to solve, we have started a Requests for Discussion (RFD) repository for the open source projects that we lead. We will announce an RFD on the mailing list that serves the community (e.g., sdc-discuss) to host the actual discussion, with a link to the corresponding directory in the repo that will host artifacts from the discussion. We intend to kick off RFDs for the obvious things like adding new endpoints, adding new commands, adding new services, changing the behavior of endpoints and commands, etc. — but also for the less well-defined stuff that captures earlier thinking.

Finally, for the RFD that finally got us off the mark on doing this, see RFD 1: Triton Container Naming Service. Discussion very much welcome!

Software: Immaculate, fetid and grimy

September 3, 2015
One Comment

Once, long ago, there was an engineer who broke the operating system particularly badly. Now, if you’ve implemented important software for any serious length of time, you’ve seriously screwed up at least once — but this was notable for a few reasons. First, the change that the engineer committed was egregiously broken: the machine that served as our building’s central NFS server wasn’t even up for 24 hours running the change before the operating system crashed — an outcome so bad that the commit was unceremoniously reverted (which we called a “backout”). Second, this wasn’t the first time that the engineer had been backed out; being backed out was serious, and that this had happened before was disconcerting. But most notable of all: instead of taking personal responsibility for it, the engineer had the audacity to blame the subsystem that had been the subject of the change. Now on the one hand, this wasn’t entirely wrong: the change had been complicated and the subsystem that was being modified was a bit of a mess — and it was arguably a preexisting issue that had merely been exposed by the change. But on the other hand, it was the change that exposed it: the subsystem might have been brittle with respect to such changes, but it had at least worked correctly prior to it. My conclusion was that the problem wasn’t the change per se, but rather the engineer’s decided lack of caution when modifying such a fragile subsystem. While the recklessness that had become a troubling pattern for this particular engineer, it seemed that there was a more abstract issue: how does one safely make changes to a large, complicated, mature software system?

Hoping to channel my frustration into something positive, I wrote up an essay on the challenges of developing Solaris, and sent it out to everyone doing work on the operating system. The taxonomy it proposed turned out to be useful and embedded itself in our engineering culture — but the essay itself remained private (it pre-dated blogs.sun.com by several years). When we opened the operating system some years later, the essay was featured on opensolaris.org. But as that’s obviously been ripped down, and because the taxonomy seems to hold as much as ever, I think it’s worth reiterating; what follows is a polished (and lightly updated) version of the original essay.

In my experience, large software systems — be they proprietary or open source — have a complete range of software quality within their many subsystems.

Immaculate

Some subsystems you find are beautiful works of engineering — they are squeaky clean, well-designed and well-crafted. These subsystems are a joy to work in but (and here’s the catch) by virtue of being well-designed and well-implemented, they generally don’t need a whole lot of work. So you’ll get to use them, appreciate them, and be inspired by them — but you probably won’t spend much time modifying them. (And because these subsystems are such a pleasure to work in, you may find that the engineer who originally did the work is still active in some capacity — or that there is otherwise a long line of engineers eager to do any necessary work in such a rewarding environment.)

Fetid

Other subsystems are cobbled-together piles of junk — reeking garbage barges that have been around longer than anyone remembers, floating from one release to the next. These subsystems have little-to-no comments (or what comments they have are clearly wrong), are poorly designed, needlessly complex, badly implemented and virtually undebuggable. There are often parts that work by accident, and unused or little-used parts that simply never worked at all. They manage to survive for one or more of the following reasons:

They work just well enough to not justify the cost of either rewriting them or switching them out
The problem they solve isn’t important enough to justify the cost of rewriting them or switching them out
The problem they solve is so nasty that the cost of a rewrite or a switch is enormous — or at least that it dwarfs the cost of ongoing maintenance

If you find yourself having to do work in one of these subsystems, you must exercise extreme caution: you will need to write as many test cases as you can think of to beat the snot out of your modification, and you will need to perform extensive self-review. You can try asking around for assistance, but you’ll quickly discover that no one is around who understands the subsystem. Your code reviewers probably won’t be able to help much either — maybe you’ll find one or two people that have had the same misfortune that you find yourself experiencing, but it’s more likely that you will have to explain most aspects of the subsystem to your reviewers. You may discover as you work in the subsystem that maintaining it is simply untenable — and it may be time to consider rewriting the subsystem from scratch. (After all, most of the subsystems that are in the first category replaced subsystems that were in the second.) One should not come to this decision too quickly — rewriting a subsystem from scratch is enormously difficult and time-consuming. Still, don’t rule it out a priori.

Even if you decide not to rewrite such a subsystem, you should improve it while you’re there in manners that don’t introduce excessive risk. For example, if something took you a while to figure out, don’t hesitate to add a block comment to explain your discoveries. And if it was a pain in the ass to debug, you should add the debugging support that you found lacking. This will make it slightly easier on the next engineer — and it will make it easier on you when you need to debug your own modifications.

Grimy

Most subsystems, however, don’t actually fall neatly into either of these categories — they are somewhere in the middle. That is, they have parts that are well thought-out, or design elements that are sound, but they are also littered with implicit intradependencies within the subsystem or implicit interdependencies with other subsystems. They may have debugging support, but perhaps it is incomplete or out of date. Perhaps the subsystem effectively met its original design goals, but it has been extended to solve a new problem in a way that has left it brittle or overly complex. Many of these subsystems have been fixed to the point that they work reliably — but they are delicate and they must be modified with care.

The majority of work that you will do on existing code will be to subsystems in this last category. You must be very cautious when making changes to these subsystems. Sometimes these subsystems have local experts, but many changes will go beyond their expertise. (After all, part of the problem with these subsystems is that they often weren’t designed to accommodate the kind of change you might want to make.) You must extensively test your change to the subsystem. Run your change in every environment you can get your hands on, and don’t be be content that the software seems to basically work — you must beat the hell out of it. Obviously, you should run any tests that might apply to the subsystem, but you must go further. Sometimes there is a stress test available that you may run, but this is not a substitute for writing your own tests. You should review your own changes extensively. If it’s multithreaded, are you obeying all of the locking rules? (What are the locking rules, anyway?) Are you building implicit new dependencies into the subsystem? Are you using interfaces in a new way that may present some new risk? Are the interfaces that the subsystem exports being changed in a way that violates an implicit assumption that one of the consumers was making? These are not questions with easy answers, and you’ll find that it will often be grueling work just to gain confidence that you are not breaking or being broken by anything else.

If you think you’re done, review your changes again. Then, print your changes out, take them to a place where you can concentrate, and review them yet again. And when you review your own code, review it not as someone who believes that the code is right, but as someone who is certain that the code is wrong: review the code as if written by an archrival who has dared you to find anything wrong with it. As you perform your self-review, look for novel angles from which to test your code. Then test and test and test.

It can all be summed up by asking yourself one question: have you reviewed and tested your change every way that you know how? You should not even contemplate pushing until your answer to this is an unequivocal YES.. Remember: you are (or should be!) always empowered as an engineer to take more time to test your work. This is true of every engineering team that I have ever or would ever work on, and it’s what makes companies worth working for: engineers that are empowered to do the Right Thing.

Production quality all the time

You should assume that once you push, the rest of the world will be running your code in production. If the software that you’re developing matters, downtime induced by it will be painful and expensive. But if the software matters so much, who would be so far out of their mind as to run your changes so shortly after they integrate? Because software isn’t (or shouldn’t be) fruit that needs to ripen as it makes its way to market — it should be correct when it’s integrated. And if we don’t demand production quality all the time, we are concerned that we will be gripped by the Quality Death Spiral. The Quality Death Spiral is much more expensive than a handful of outages, so it’s worth the risk — but you must do your part by delivering production quality all the time.

Does this mean that you should contemplate ritual suicide if you introduce a serious bug? Of course not — everyone who has made enough modifications to delicate, critical subsystems has introduced a change that has induced expensive downtime somewhere. We know that this will be so because writing system software is just so damned tricky and hard. Indeed, it is because of this truism that you must demand of yourself that you not integrate a change until you are out of ideas of how to test it. Because you will one day introduce a bug of such subtlety that it will seem that no one could have caught it.

And what do you do when that awful, black day arrives? Here’s a quick coping manual from those of us who have been there:

Don’t pretend it didn’t happen — you screwed up, but your mother still loves you
Don’t minimize the problem, shrug it off or otherwise make light of it — this is serious business, and your colleagues take it seriously
If someone spent time debugging your bug, thank them
If someone was inconvenienced by your bug, apologize to them
Take responsibility for your bug — don’t bother to blame other subsystems, the inherent complexity of the software, your code reviewers, your testers, the community, etc.
If it was caught before it was running in production, be thankful that a production user wasn’t affected by it

But most importantly, you must ask yourself: what could I have done differently? If you honestly don’t know, ask a fellow engineer to help you. We’ve all been there, and we want to make sure that you are able to learn from it. Once you have an answer, take solace in it; no matter how bad you feel for having introduced a problem, you can know that the experience has improved you as an engineer — and that’s the most anyone can ask for.

The foundation of cloud-native computing

July 21, 2015
No Comments

The older I get, the more engineering values matter to me — and the more I seek out shared values in those with whom I endeavor to build things. For us at Joyent, those engineering values reflect that we operate the software we make: we believe that foundational systems must be designed to be robust and high-performing — and when they fail in this regard, it is incumbent upon the system itself to provide the tooling to diagnose the errant behavior. These values are not new (indeed, they are some of the oldest in computing), but there are times when they can feel endangered. It is our belief that the rise of cloud computing has — if anything — made the traditional values of systems software robustness more important. Recently, I’ve had the opportunity to get to know some of the Google engineers involved in the Kubernetes effort, and I have found that they broadly share Joyent’s engineering values — that they too seek to build a robust software substrate, as informed by their (substantial) experience operating systems at scale. Given our shared values, I was particularly pleased to learn of Google’s desire to create a new kind of foundation with their formation of the Cloud-native Computing Foundation. Today, I am excited to announce that Joyent is a charter member of the Cloud-native Computing Foundation, as it represents the values we sought to embody in the Triton stack — and I am honored to have been personally asked to serve on the foundation’s technical steering committee. We believe that we haven’t just joined a(nother) foundation, we have joined with those who share the mission that we have always had for ourselves: to help effect the next revolution in computing.

That I could possibly be so enthusiastic for a foundation merits further explanation, as I have historically been very forthright with my skepticism about foundations with respect to open source: three years ago, in a presentation on Corporate Open Source Anti-patterns (video), I described the insistence of giving newly-opened source code to a foundation as an anti-pattern, noting that giving up ownership also eschews leadership. I further cautioned that many underestimate the complexity and constraints of a 501(c)(3) — while overestimating the need for an explicitly non-profit organization’s involvement in a company’s open source efforts. While these statements about foundations were unequivocal, I also ended that presentation by saying that my observations shouldn’t be perceived as hard rules — and implied that the thinking may change over time as we continue to learn from our own experiences.

Three years after that presentation, I still broadly stand by my claims — but (as my enthusiasm for the Cloud Native Computing Foundation indicates) foundations are one area where my thinking has definitely shifted. In particular, in those rare instances when an open source technology reaches a level of ubiquity such as to sediment into collective bedrock, I believe that it actually does belong in a foundation. How do you know if your open source project is in this category? If multiple companies are betting their future on your open source project, congratulate yourself for laying down the bedrock upon which others are building — and then get it into a foundation to assure its future. This can be hard to internalize (after all, you have almost certainly put more resources into it than anyone else; why should you be expected to simply give that away?!), but the reality is that the commercial pressures that are now being exerted on your (incredibly popular!) technology will rip it apart if you don’t preserve its fate. This can be doubly frustrating when you feel you are acting in the community’s best interests, but as soon as that community includes rival commercial interests, only a foundation can provide the necessary (but not sufficient!) neutrality to assure the community that the technology’s future transcends the fate of any one company. Certainly, we learned all this the hard way with node.js — but the problem is in no way unique to node.js or to Joyent. Indeed, with open source now essentially a constraint on new infrastructure software, we can expect this transition (from corporate-owned open source to foundation-owned open source) will happen with increasing frequency. (Should you find yourself at OSCON this week, this trend and its ramifications is the subject of my talk on Thursday.)

In this regard, the Docker world has been particularly interesting of late: the domain is entirely open source, with many companies (including Joyent!) betting their futures not just on Docker, but on the many other technologies in the ecosystem. With so much bedrock suddenly forming, foundations were practically preordained — so it was no surprise to see the announcement of the Open Container Project at DockerCon just a few weeks ago. We at Joyent applaud these developments (and we are a charter member of the OCP), but I confess that the sprouting of foundations has left me feeling somewhat underwhelmed: are we really to have a foundation for every GitHub repo that reaches a certain level of popularity? To be clear, I don’t object to the foundations in the abstract so much as the cacophony of their putative missions: having the mission of a foundation being merely to promote a particular technology feels like it’s aiming a bit low in Maslow’s hierarchy of needs. Now, one can certainly collect open source software into a foundation like the Apache Foundation — but as we move to a world where an increasing amount of software is open source, what becomes of their mission? Foundations that are amalgamations of otherwise unrelated software seem to me to run the risk of becoming open source orphanages: providing shelter and a modicum of structure, perhaps, but lacking a sense of collective purpose.

The promise of the Cloud-native Computing Foundation is that it offers a potential third model: while the foundation will serve as the new home for Kubernetes, it’s not limited to Kubernetes — nor is it an open source dumping ground. Rather, this foundation is dedicated to a particular ethos: the creation of the new kinds of application and (especially) service stacks that represent modern, server-side computing. That is, it is a foundation with a true mission: to advance key open source technologies that constitute modern, elastic computing. As such, it seeks to transcend any single technology — it has a raison d’être that runs deeper than mere self-preservation. I would like to think that this third parth can serve as a model in the new, all-open world: foundations as entities that don’t let their corporate neutrality prevent them from being opinionated as to their mission, their constituent technologies or — importantly — their engineering values!

Triton: Docker and the "best of all worlds"

March 24, 2015
One Comment

When Docker first rocketed into the nerdosphere in 2013, some wondered how we at Joyent felt about its popularity. Having run OS containers in multi-tenant production for nearly a decade (and being one of the most vocal proponents of OS-based virtualization), did we somehow resent the relatively younger Docker? Some were surprised to learn that (to the contrary!) we have been elated to see the rise of Docker: we share with Docker a vision for a containerized future, and we love that Docker has brought the technology to a much broader audience — and via an entirely different vector (namely, emphasizing developer agility instead of merely operational efficiency). Given our enthusiasm, you can imagine the question we posed to ourselves over a year ago: could we somehow combine the operational strength of SmartOS containers with the engaging developer experience of Docker? Importantly, we had no desire to develop a “better” Docker — we merely wanted to use SmartOS and SmartDataCenter as a substrate upon which to deploy Docker containers directly onto the metal. Doing this would leverage over a decade of deep operating systems engineering with technologies like Crossbow, ZFS, DTrace and (of course) Zones — and would deliver all of the operational advantages of pure OS-based virtualization to Docker containers: performance, elasticity, security and density.

That said, there was an obvious hurdle: while designed to be cross-platform, Docker is a Linux-borne technology — and the repository of Docker images is today a collection of Linux binaries. While SmartOS is Unix, it (somewhat infamously) isn’t Linux: applications need to be at least recompiled (if not ported) to work on SmartOS. Into this gap came a fortuitous accident: David Mackay, a member of the illumos community, attempted to revive LX-branded zones, an old Sun project that provided Linux emulation in a zone. While this project had been very promising when it was first done years ago, it had also been restricted to emulating a 2.4 Linux kernel for 32-bit binaries — and it was clear at the time that modernizing it was going to be significant work. As a result, the work sat unattended in the system for a while before being unceremoniously ripped out in 2010. It seemed clear that with the passage of time, this work would hardly be revivable: it had been so long, any resurrection was going to be tantamount to a rewrite.

But fortunately, David didn’t ask us our opinion before he attempted to revive it — he just did it. (As an aside: a tremendous advantage of open source is that the community can perform experiments that you might deem too risky or too expensive in terms of opportunity cost!) When David reported his results, we were taken aback: yes, this had the same limitations that it had always had (namely, 32-bit and lacking many modern Linux facilities), but given how many modern binaries still worked, it was also clear that this was a more viable path than we had thought. Energized by David’s results, Joyent’s Jerry Jelinek picked it up from there, reintegrating the Linux brand into SmartOS in March of last year. There was still much to do of course, but Jerry’s work was a start — and reflected the constraints we imposed on ourselves: do it all in the open; do it all on SmartOS master; develop general-purpose illumos facilities wherever possible; and aim to upstream it all when we were done.

Around this time, I met with Docker CTO Solomon Hykes to share our (new) vision. Honestly, I didn’t know what his reaction would be; I had great respect for what Docker had done and was doing, but didn’t know how he would react to a system bold enough to go its own way at such a fundamental level. Somewhat to my surprise, Solomon was incredibly supportive: not only was he aware of SmartOS, but he was also intimately familiar with zones — and he didn’t need to be convinced of the merits of our approach. Better, he asked a question near and dear to my heart: “Does this mean that I’ll be able to DTrace my Linux apps in a Docker container?” When I indicated that yes, that’s exactly what it would mean, he responded: “It will be the best of all worlds!” That Solomon (and by extension, Docker) was not merely willing but actually eager to see Docker on SmartOS was hugely inspirational to us, and we redoubled our efforts.

Back at Joyent, we worked assiduously under Jerry’s leadership over the spring and summer, and by the fall, we were ready for an attempt on the summit: 64-bit. Like other bringup work we’ve done, this work was terrifying in that we had very little forward visibility, and little ability to parallelize. As if he were Obi-Wan Kenobi meeting Darth Vader in the Death Star, Jerry had to face 64-bit — alone. Fortunately, Jerry didn’t suffer Ben Kenobi’s fate; by late October, he had 64-bit working! With the project significantly de-risked, everything kicked into high gear: Josh Wilsdon, Trent Mick and their team went to work understanding how to integrate SmartDataCenter with Docker; Josh Clulow, Patrick Mooney and I attacked some of the nasty LX-branded zone issues that remained; and Robert Mustacchi and Rob Gulewich worked towards completing their vision for network virtualization. Knowing what we were going to do — and how important open source is to modern infrastructure software in general and Docker in particular — we also took an important preparatory step: we open sourced SmartDataCenter and Manta.

Charged by having all of our work in the open and with a clear line of sight on what we wanted to deliver, progress was rapid. One major question: where to run the Docker daemon? In digging into Docker, we saw that much of what the actual daemon did would need to be significantly retooled to be phrased in terms of not only SmartOS but also SmartDataCenter. However, our excavations also unearthed a gem: the Docker Remote API. Discovering a robust API was a pleasant surprise, and it allowed us to take a different angle: instead of running a (heavily modified) Docker daemon, we could implement a new SDC service to provide a Docker Remote API endpoint. To Docker users, this would look and feel like Docker — and it would give us a foundation that we knew we could develop. At this point, we’re pretty good at developing SDC-based services (microservices FTW!), and progress on the service was quick. Yes, there were some thorny issues to resolve (and definitely note differences between our behavior and the stock Docker behavior!), but broadly speaking we have been able to get it to work without violating the principle of least surprise. And from a Docker developer perspective, having a Docker host that represents an entire datacenter — that is, a (seemingly) galactic Docker host — feels like an important step forward. (Many are as excited by this work as we are, but I think my favorite reaction is the back-handed compliment from Jeff Waugh of Canonical fame; somehow a compliment that is tied to an insult feels indisputably earnest.)

With everything coming together, and with new hardware being stood up for the new service, there was one important task left: we needed to name this thing. (Somehow, “SmartOS + LX-branded zones + SmartDataCenter + sdc-portolan + sdc-docker” was a bit of a mouthful.) As we thought about names, I turned back to Solomon’s words a year ago: if this represented the best of two different worlds, what mythical creatures were combinations of different animals? While this search yielded many fantastic concoctions (a favorite being Manticore — and definitely don’t mess with Typhon!), there was one that stood out: Triton, son of Poseidon. As half-human and half-fish and a god of the deep, Triton represents the combination of two similar but different worlds — and as a bonus, the name rolls off the tongue and fits nicely with the marine metaphor that Docker has pioneered.

So it gives me great pleasure to introduce Triton to the world — a piece of (open source!) engineering brought to you by a cast of thousands, over the course of decades. In a sentence (albeit a wordy one), Triton lets you run secure Linux containers directly on bare metal via an elastic Docker host that offers tightly integrated software-defined networking. The service is live, so if you want to check it out, sign up! If you’re looking for more technical details, check out both Casey’s blog entry and my Future of Docker in Production presentation. If you’d like it on-prem, get in touch. And if you’d prefer to DIY, start with sdc-docker. Finally, forgive me one shameless plug: if you happen to be in the New York City area in early April, be sure to join us at the Container Summit, where we’ll hear perspectives from analysts like Gartner, enterprise users of containers like Lucera and Walmart, and key Docker community members like Tutum, Shopify, and Docker themselves. Should make for an interesting afternoon!

Welcome to Triton — and to the best of all worlds!

Immaculate

Fetid

Grimy

Production quality all the time

Recent Posts

Archives

Archives