The Observation Deck

Search
Close this search box.

Category: Solaris

So IBM has been on the warpath recently against OpenSolaris, culminating with their accusation yesterday that OpenSolaris is a “facade.” This is so obviously untrue that it’s not even worth refuting in detail. In fact, being the father of a toddler, I would liken IBM’s latest outburst to something of a temper tantrum — and as with a screaming toddler, the best way to deal with this is to not reward the performance, but rather to offer some constructive alternatives. So, without further ado, here are my constructive suggestions to IBM:

  • Open source OS/2. Okay, given the tumultuous history of OS/2, this is almost certainly not possible from a legal perspective — but it would be a great open source contribution (some reasonably interesting technology went down with that particular ship), and many hobbyists would love you for it. Like I said, it’s probably not possible — but just to throw it out there.

  • Open source AIX. AIX is one of the true enterprise-class operating systems — one with a long history of running business-critical applications. As such, it would be both a great contribution to open source and a huge win for AIX customers for AIX to go open source — if only to be able to look at the source code when chasing a problem that isn’t necessarily a bug in the operating system. (And I confess that on a personal level, I’m very curious to browse the source code of an operating system that was ported from PL/1.) However, as with OS/2, AIX’s history is going to likely make open sourcing it tough from a legal perspective: its Unix license dates from the Bad Old Days, and it would probably be time consuming (and expensive) to unencumber the system to allow it to be open sourced.

Okay, those two are admittedly pretty tough for legal reasons. Here are some easier ones:

  • Support the port of OpenSolaris to POWER/PowerPC. Sun doesn’t sell POWER-based gear, so you would have the comfort of knowing that your efforts would in no way assist a Sun hardware sale, and your POWER customers would undoubtedly be excited to have another choice for their deployments.

  • Support the nascent effort to port OpenSolaris to the S/390. Hey, if Linux makes sense on an S/390, surely OpenSolaris with all of its goodness makes sense too, right? Again, customers love choice — and even an S/390 customer that has no intention of running OpenSolaris will love having the choice made available to them.

Okay, so those two are easier because the obstacles aren’t legal obstacles, but there are undoubtedly internal IBM cultural issues that make them effectively non-starters.

So here’s my final suggestion, and it’s an absolutely serious one. It’s also relatively easy, it clearly and immediately benefits IBM and IBM’s customers — and it even doesn’t involve giving up any IP:

  • Port DTrace to AIX. Your customers want it. Apple has shown that it can be done. We’ll help you do it. And you’ll get to participate in the DTrace community (and therefore the OpenSolaris community) in a way that doesn’t leave you feeling like you’ve been scalped by Scooter. Hell, you can even follow Apple’s lead with Xray and innovate on top of DTrace: from talking to your customers over the years, it’s clear that they love SMIT — integrate a SMIT frontend with a DTrace backend! Your customers will love you for it, and the DTrace community will be excited to have yet another system on which that they can use DTrace.

Now, IBM may respond to these alternatives just as a toddler sometimes responds to constructive alternatives (“No! No! NO! Mine! MINE! MIIIIIIIINE!”, etc). But if cooler heads prevail at Big Blue, these suggestions — especially the last one — will be seen as a way to constructively engage that will have clear benefits for IBM’s customers (and therefore for IBM). So to IBM I say what parents have said to screaming toddlers for time immemorial: we’re ready when you are.

From WWDC here in San Francisco: Apple has just announced support for DTrace in Leopard, the upcoming release of Mac OS X! Often (or even usually?) announcements at conferences are more vapor than ware. In this case, though, Apple is being quite modest: they have done a tremendous amount of engineering work to bring DTrace to their platform (including, it turns out, implementing DTrace’s FBT provider for PowerPC!), and they are using DTrace as part of the foundation for their new Xray performance tool. This is very exciting news, as it brings DTrace to a whole slew of new users. (And speaking personally, it will be a relief to finally have DTrace on the only computer under my roof that doesn’t run Solaris!) Having laid hands on DTrace on Mac OS X myself just a few hours ago, I can tell you that while it’s not yet a complete port, it’s certainly enough to be uniquely useful — and it was quite a thrill to see Objective C frames in a ustack action! So kudos to the Apple engineering team working on the port: Steve Peters, James McIlree, Terry Lambert, Tom Duffy and Sean Callanan.

It’s been fun for us to work with the Apple team, and gratifying to see their results. And it isn’t just rewarding for us; the entire OpenSolaris community should feel proud about this development because it gives the lie to IBM’s nauseating assertion that we in the OpenSolaris community aren’t in the “spirit” of open source.

So to Apple users: welcome to DTrace! (And to DTrace users: welcome to Mac OS X!) Be sure to join us in the DTrace community — and if you’re at the WWDC, we on Team DTrace will be on hand on Friday at the DTrace session, so swing by to see a demo of DTrace running on MacOS and to meet both the team at Apple that worked on the port and we at Sun who developed DTrace. And with a little luck, you might even hear Adam commemorate the occasion by singing his beautiful and enchanting ISA aria…

Update: For disbelievers, Adam posted photos — and Mike went into greater detail on the state of the Leopard DTrace port, and what it might mean for the direction of DTrace.

A while ago, I blogged about the possibility of a FreeBSD port of DTrace. For the past few months, John Birrell has been hard at work on the port, and has announced recently that he has much of the core DTrace functionality working. Over here at DTrace Central, we’ve been excitedly watching John’s progress for a little while, providing help and guidance where we can — albeit not always solicited 😉 — and have been very impressed with how far he’s come. And while John has quite a bit further to go before one could call it a complete port, what he has now is indisputably useful. If you run FreeBSD in production, you’re going to want John’s port as it stands today — and if you develop for the FreeBSD kernel (drivers or otherwise), you’re going to need it. (Once you’ve done kernel development with DTrace, there’s no going back.)

So this is obviously a win for FreeBSD users, who can now benefit from the leap in software observability that DTrace provides. It’s also clearly a win for DTrace users, because now you have another platform on which you can observe your production software — and a larger community with whom to share your experiences and thoughts. And finally, it’s a huge win for OpenSolaris users: the presence of a FreeBSD port of DTrace validates that OpenSolaris is an open, innovative platform (despite what some buttheads say) — one that will benefit from and contribute to the undeniable economics of open source.

So congrats to John! And to the FreeBSD folks: welcome to the DTrace community!

First I need to apologize for having been absent for so long — I am very much heads-down on a new project. (The details of which will need to wait for another day, I’m afraid — but suffice it to say that it, like just about everything else I’ve done at Sun, leverages much of what I’ve done before it.)

That said, I wanted to briefly emerge to discuss some recent work. A while ago, I blogged about DTrace and Ruby, using Rich Lowe‘s prototype DTrace provider. This provider represents a quantum leap for Ruby observability, but it suffers from the fact that we must do work (in particular, we must get the class and method) even when disabled. This is undesirable (especially considering that the effect can be quite significant — up to 2X), and it runs against the DTrace ethos of zero disabled probe effect, but there has been no better solution. Now, however, thanks to Adam’s work on is-enabled probes, we can have a Ruby provider that has zero disabled probe effect. (Or essentially zero: I actually measured the probe effect at 0.2% — very much in the noise.) Having zero disabled probe effect allows us to deploy DTrace on Ruby in production — which in turn opens up a whole new domain for DTrace: Ruby on Rails. And as I was reminded by Jason Hoffman‘s recent Scale with Rails presentation (in which he outlines why they picked Solaris generally — and ZFS in particular), this is a hugely important growth area for Solaris. So without further ado, here is a (reasonably) simple script that relies on some details of WEBrick and Rails to yield a system profile for Rails requests:

#pragma D option quiet

self string uri;

syscall::read:entry
/execname == "ruby" && self->uri == NULL/
{
        self->fd = arg0;
        self->buf = arg1;
        self->size = arg2;
}

syscall::read:return
/self->uri == NULL && self->buf != NULL && strstr(this->str =
    copyinstr(self->buf, self->size), "GET ") == this->str/
{
        this->head = strtok(this->str, " ");
        self->uri = this->head != NULL ? strtok(NULL, " ") : NULL;
        self->syscalls = 0;
        self->rbcalls = 0;
}

syscall::read:return
/self->buf != NULL/
{
        self->buf = NULL;
}

syscall:::entry
/self->uri != NULL/
{
        @syscalls[probefunc] = count();
}

ruby$1:::function-entry
/self->uri != NULL/
{
        @rbclasses[this->class = copyinstr(arg0)] = count();
        this->sep = strjoin(this->class, "#");
        @rbmethods[strjoin(this->sep, copyinstr(arg1))] = count();
}

pid$1::mysql_send_query:entry
/self->uri != NULL/
{
        @queries[copyinstr(arg1)] = count();
}

syscall::write:entry
/self->uri != NULL && arg0 == self->fd && strstr(this->str =
    copyinstr(arg1, arg2), "HTTP/1.1") == this->str/
{
        self->uri = NULL;
        ncalls++;
}

END
{
        normalize(@syscalls, ncalls);
        trunc(@syscalls, 10);
        printf("Top ten system calls per URI serviced:\n");
        printf("---------------------------------------");
        printf("--------------------------------+------\n");
        printa("  %-68s | %@d\n", @syscalls);

        normalize(@rbclasses, ncalls);
        trunc(@rbclasses, 10);
        printf("\nTop ten Ruby classes called per URI serviced:\n");
        printf("---------------------------------------");
        printf("--------------------------------+------\n");
        printa("  %-68s | %@d\n", @rbclasses);

        normalize(@rbmethods, ncalls);
        trunc(@rbmethods, 10);
        printf("\nTop ten Ruby methods called per URI serviced:\n");
        printf("---------------------------------------");
        printf("--------------------------------+------\n");
        printa("  %-68s | %@d\n", @rbmethods);

        trunc(@queries, 10);
        printf("\nTop ten MySQL queries:\n");
        printf("---------------------------------------");
        printf("--------------------------------+------\n");
        printa("  %-68s | %@d\n", @queries);
}

Running the above while horsing around with the Depot application from Agile Web Development with Rails yields the following:

Top ten system calls per URI serviced:
-----------------------------------------------------------------------+------
  setcontext                                                           | 15
  fcntl                                                                | 16
  fstat64                                                              | 16
  open64                                                               | 21
  close                                                                | 25
  llseek                                                               | 27
  lwp_sigmask                                                          | 30
  read                                                                 | 62
  pollsys                                                              | 80
  stat64                                                               | 340

Top ten Ruby classes called per URI serviced:
-----------------------------------------------------------------------+------
  ActionController::CodeGeneration::Source                             | 89
  ActionController::CodeGeneration::CodeGenerator                      | 167
  Fixnum                                                               | 190
  Symbol                                                               | 456
  Class                                                                | 556
  Hash                                                                 | 1000
  String                                                               | 1322
  Array                                                                | 1903
  Object                                                               | 2364
  Module                                                               | 6525

Top ten Ruby methods called per URI serviced:
-----------------------------------------------------------------------+------
  Object#dup                                                           | 235
  String#==                                                            | 250
  Object#is_a?                                                         | 288
  Object#nil?                                                          | 316
  Hash#[]                                                              | 351
  Symbol#to_s                                                          | 368
  Object#send                                                          | 593
  Module#included_modules                                              | 1043
  Array#include?                                                       | 1127
  Module#==                                                            | 5058

Top ten MySQL queries:
-----------------------------------------------------------------------+------
  SELECT * FROM products  LIMIT 0, 10                                  | 2
  SELECT * FROM products WHERE (products.id = '7')  LIMIT 1            | 2
  SELECT count(*) AS count_all FROM products                           | 2
  SHOW FIELDS FROM products                                            | 5

While this gives us lots of questions we might want to answer (e.g., “why the hell are we doing 340 stats on every ‘effing request!”1), it might be a little easier to look at a view that lets us see requests and the database queries that they induce. Here, for example, is a similar script to do just that:

#pragma D option quiet

self string uri;
self string newuri;

BEGIN
{
        start = timestamp;
}

syscall::read:entry
/execname == "ruby" && self->uri == NULL/
{
        self->fd = arg0;
        self->buf = arg1;
        self->size = arg2;
}

syscall::read:return
/self->uri == NULL && self->buf != NULL && (strstr(this->str =
    copyinstr(self->buf, self->size), "GET ") == this->str ||
    strstr(this->str, "POST ") == this->str)/
{
        this->head = strtok(this->str, " ");
        self->newuri = this->head != NULL ? strtok(NULL, " ") : NULL;
}

syscall::read:return
/self->newuri != NULL/
{
        self->uri = self->newuri;
        self->newuri = NULL;
        printf("%3d.%03d => %s\n", (timestamp - start) / 1000000000,
            ((timestamp - start) / 1000000) % 1000, self->uri);
}

pid$1::mysql_send_query:entry
/self->uri != NULL/
{
        printf("%3d.%03d   -> \"%s\"\n", (timestamp - start) / 1000000000,
            ((timestamp - start) / 1000000) % 1000,
            copyinstr(self->query = arg1));
}

pid$1::mysql_send_query:return
/self->query != NULL/
{
        printf("%3d.%03d   <- \"%s\"\n", (timestamp - start) / 1000000000,
             ((timestamp - start) / 1000000) % 1000,
             copyinstr(self->query));
        self->query = NULL;
}

syscall::read:return
/self->buf != NULL/
{
        self->buf = NULL;
}

syscall::write:entry
/self->uri != NULL && arg0 == self->fd && strstr(this->str =
    copyinstr(arg1, arg2), "HTTP/1.1") == this->str/
{
        printf("%3d.%03d <= %s\n", (timestamp - start) / 1000000000,
             ((timestamp - start) / 1000000) % 1000,
             self->uri);
        self->uri = NULL;
}

Running the above while clicking around with the Depot app:

# ./rsnoop.d `pgrep ruby`
  7.936 => /admin/edit/7
  7.955   -> "SELECT * FROM products WHERE (products.id = '7')  LIMIT 1"
  7.956   <- "SELECT * FROM products WHERE (products.id = '7')  LIMIT 1"
  7.957   -> "SHOW FIELDS FROM products"
  7.957   <- "SHOW FIELDS FROM products"
  7.971 <= /admin/edit/7  20.881 => /admin/update/7
 20.952   -> "SELECT * FROM products WHERE (products.id = '7')  LIMIT 1"
 20.953   <- "SELECT * FROM products WHERE (products.id = '7')  LIMIT 1"
 20.953   -> "SHOW FIELDS FROM products"
 20.953   <- "SHOW FIELDS FROM products"  20.954   -> "BEGIN"
 20.954   <- "BEGIN"  20.955   -> "UPDATE products SET `title` = 'foo bar', `price` = 1.2, ...
 20.955   <- "UPDATE products SET `title` = 'foo bar', `price` = 1.2, ...
 20.989   -> "COMMIT"
 20.989   <- "COMMIT"
 21.001 <= /admin/update/7  21.005 => /admin/show/7
 21.023   -> "SELECT * FROM products WHERE (products.id = '7')  LIMIT 1"
 21.023   <- "SELECT * FROM products WHERE (products.id = '7')  LIMIT 1"
 21.024   -> "SHOW FIELDS FROM products"
 21.024   <- "SHOW FIELDS FROM products"
 21.038 <= /admin/show/7

I’m no Rails developer, but it seems like this might be useful… If you want to check this out for yourself, start by getting Rich’s prototype provider. (Using it, you can do everything I’ve done here, just with higher disabled probe effect.) Meanwhile, I’ll work with Rich to get the lower disabled probe effect version out shortly. Happy Railing!


1 Or if you’re as savvy as the commenters on this blog entry, you might be saying to yourself, “why the hell is the ‘effing development version running in production?!”

There are two ways to get DTrace for another operating system: you can try porting DTrace to the other system, or you can — as Adam Leventhal describes — use the new BrandZ framework to get that other system running under Solaris. Adam describes applying DTrace to a Linux binary — top — and using DTrace to find a (pretty serious) Linux-specific performance problem. Pretty eff’in cool…

If you haven’t already seen it, ZFS is now available for download, marking a major milestone in the history of filesystems. Today is a way station down a long road: for as long as I have known Jeff Bonwick, he has wanted to solve the filesystem problem — and about five years ago, Jeff set out to do just that, starting (as Jeff is wont to do) from a blank sheet of paper. I vividly remember Jeff describing some of his nascent ideas on my whiteboard; the ideas were radical and revolutionary, their implications manifold. I remember thinking “he’s either missed something basic that somehow invalidates these ideas — or this is the most important development in storage since RAID.” As I recently recounted, Jeff is the reason that I came to Sun almost a decade ago — and in particular, I was drawn by Jeff’s contagious belief that nothing is impossible simply because it hasn’t been done before. So I knew better than to doubt him at the time — and I knew that the road ahead promised excitement if nothing else. Years after that moment, there is no other conclusion left to be had: ZFS is the most important revolution in storage software in two decades — and may be the most important idea since the filesystem itself. That may seem a heady claim, but keep reading…

To get an idea of what ZFS can do, first check out Dan Price‘s awesome ZFS flash demo Then join me on a tour of today’s ZFS blog entries, as ZFS developers and users inside Sun illustrate the power of ZFS: ease of administration, absolute reliability and rippin’ performance.

  • Administration. If you’re an administrator, start your ZFS blog tour with Martin Englund‘s entry on ZFS from a sysadmin’s view. Martin walks you through the ease of setting up ZFS; there are no hidden wires — it really is that easy! And if, as a self-defence mechanism, your brain refuses to let you recall the slog of traditional volume management, check out Tim Foster‘s entry comparing ZFS management to Veritas management. (And have we mentioned the price?) For insight into the design principles that guided the development of the administration tools, check out Eric Schrock‘s entry on the principles of the ZFS CLI. Eric’s entry and his design reflect the principles that we used in DTrace as well: make it simple to do simple things and make it possible to do complicated things. As you can imagine, this simplicity of management is winning fans both inside and outside of Sun. For some testimonials, check out Lin Ling‘s entry on the love for ZFS — both Lin’s and our Beta customers’. As Lin’s entry implies, a common theme among ZFS users is “if I only had this when…”; check out James McPhearson‘s entry wishing he had ZFS back in the day.

    And if you think that the management of ZFS couldn’t get any easier, check out Steve Talley‘s entry on managing ZFS from your browser. Steve’s work highlights the proper role for GUI admin tools in a system: they should make something that’s already simple even simpler. They should not be used to smear lipstick over a hideously over-complicated system — doing so leads to an unresolvable rift between what the tool is telling you the system looks like, and what the system actually looks like. Thanks to the simplicity of ZFS itself, there is no second guessing about what the GUI is actually doing under the hood — it’s all just gravy!

    Speaking of gravy, check out the confluence of ZFS with another revolutionary Solaris technology in Dan Price‘s entry on ZFS and Zones thanks to some great integration work,local zone administrators can have the full power of ZFS without compromising the security of the system!

    For details on particular features of ZFS, check out Mark Maybee‘s entry on quotas and reservations in ZFS. Unlike some other systems, quotas and reservations are first-class citizen iin ZFS, not bolted-on afterthoughts. Die, /usr/sbin/quota, die! And for details on another feature of ZFS, check out Mark Shellenbaum‘s entry on access control lists in ZFS, and Lisa Week‘s entry describing why ZFS adopted the NFSv4 ACL model. Like quotas and reservations, ACLs were a part of the design of ZFS — not something that was carpet-bombed over the source after the fact.

  • Reliability. Unlike virtually every other filesystem that has come before it, ZFS is designed around unreliable hardware. This design-center means that ZFS can detect — and correct! — errors that other filesystems just silently propagate to the user. To get a visceral feel for this, read Eric Lowe‘s entry on ZFS saving the day. Reading this entry will send a chill up your spine: Eric had a data-corrupting hardware problem that he didn’t know he had until ZFS. How much data is being corrupted out there today because pre-ZFS filesystems are too trusting of faulty hardware? More to the point, how much of your data is being corrupted today? Yeah — scary, ain’t it? And not only can ZFS detect hardware errors, in a mirrored configuration it can correct them. Fortunately, you don’t have to have busted hardware to see this: look at Tim Cook‘s entry demonstrating ZFS’s self-healing by using dd to simulate date corruption.But if problems like Eric’s are all over the place, how is anyone’s data ever correct? The answer is pretty simple, if expensive: you pay for reliability by buying over-priced hardware. That is, we’ve compensated for dumb software by having smart (and expensive) hardware. ZFS flips the economics on its head: smart software allows for stupid (and cheap) hardware — with ultra-high reliability. This is a profound shift; for more details on it check out Richard Elling‘s entry on the reliability of ZFS.ZFS is reliable by its architecture, but what of the implementation? As Bill Moore writes, testing ZFS was every bit as important as writing it. And testing ZFS involved many people, as Jim Walker describes in his entry on the scope of the ZFS testing effort.
  • Performance. So fine: ZFS is a snap to administer, and it’s ultra-reliable — but at what performance cost? The short answer is: none, really — and in fact, on many workloads, it rips. How can you have such features and still have great performance? Generally speaking, ZFS is able to deliver great performance because it has more context, a phenomenon that Bill Sommerfeld notes is a consequence of the end-to-end principle. To see how this unlocks performance, look at Bill Moore‘s entry on I/O scheduling; as Bill describes (and as I can personally attest to) ZFS is much smarter about how it uses I/O devices than previous filesystems. For another architectural feature for performance, look at Neil Perrin‘s entry on the ZFS intent log — and chase it with Neelakanth Nadgir‘s entry taking you through the ZIL code.If you’re looking for some performance numbers, check out Roch Bourbonnais‘s entry comparing the performance of ZFS and UFS. Or let Eric Kustarz take you to school, as you go to Filesystems Performance 101: Disk BandwidthFilesystems Performance 102: Filesystem Bandwidth and finally graduate to Filesystems Performance 201: When ZFS Attacks!

So given that ZFS is all that, when can we start forgetting about every other on-disk filesystem? For that, we’ll need to be able to boot off ZFS. Bad news: this is hard. Good news: Tabriz Leman and the rest of the ZFS Boot team are making great progress, as Tabriz describes in her entry on booting ZFS. Once we can boot ZFS — that is, once we can assume ZFS — all sorts of cool things become possible, as Bart Smaalders brainstorms in his entry on the impact of ZFS on Solaris. As Bart says, this is just the beginning of the ZFS revolution…

Finally, this has been a long, hard slog for the ZFS team. Anyone who has worked through “crunch time” on a big project will see something of themselves in Noel Dellofano‘s entry on the final push. And any parent can empathize with Sherry Moore‘s entry congratulating the team — and looking forward to having her husband once again available to help with the kids. So congratulations to everyone on the ZFS team (and your families!) — and for everyone else, welcome to ZFS!


Technorati tags: 

Recent Posts

November 26, 2023
November 18, 2023
November 27, 2022
October 11, 2020
July 31, 2019
December 16, 2018
September 18, 2018
December 21, 2016
September 30, 2016
September 26, 2016
September 13, 2016
July 29, 2016
December 17, 2015
September 16, 2015
January 6, 2015
November 10, 2013
September 3, 2013
June 7, 2012
September 15, 2011
August 15, 2011
March 9, 2011
September 24, 2010
August 11, 2010
July 30, 2010
July 25, 2010
March 10, 2010
November 26, 2009
February 19, 2009
February 2, 2009
November 10, 2008
November 3, 2008
September 3, 2008
July 18, 2008
June 30, 2008
May 31, 2008
March 16, 2008
December 18, 2007
December 5, 2007
November 11, 2007
November 8, 2007
September 6, 2007
August 21, 2007
August 2, 2007
July 11, 2007
May 20, 2007
March 19, 2007
October 12, 2006
August 17, 2006
August 7, 2006
May 1, 2006
December 13, 2005
November 16, 2005
September 13, 2005
September 9, 2005
August 21, 2005
August 16, 2005

Archives