More blog sifting | The Observation Deck

If you didn’t see it, Liane Praza picked up where my sifting left off, adding a blog entry pointing to more Opening Day entries – this time in the categories of devices and device configuration, security, networking, and standards. But there are still a ton of entries to categorize, so picking up again in no particular order…

System calls. System calls are the among most fundamental mechanisms in operating systems: they are the mechanism by which untrusted, unprivileged software requests a service of trusted, privileged software. We are lucky to have two great entries describing the architecture-specific mechanisms of system calls in Solaris: check out Russ Blaine’s entry on system calls on x86, and Gavin Maltby’s entry on system calls on SPARC. Then, to understand the architectural-neutral aspects of system calls, head over to Eric Schrock’s entry on how to add a system call.

As a quick aside, that last entry is a great example of how we in Solaris Kernel Development are using blogs to write down information that (believe it or not) has just been an unspoked part of the craft before now. As Tim Bray observed, blogs have become a critical conduit of information for us – we believe that they are the most scalable way to get information from the people who have it to the people who need it. If (when?) you become an OpenSolaris developer, you can expect some friendly peer pressure to create a blog and join the party.
Build process and workspace management. We pride ourselves on a seamless build process, and a couple of entries have gone into various aspects of this in depth. To give you an idea of how seriously we take the build process – and why – check out Scott Rotondo’s entry on using lint to find security vulnerabilities. In particular, note what Scott says when he added a new lint option that generated 500 new warnings: “I needed to fix all of these before integrating my change to Makefile.master because we require the Solaris source to be lint-clean.” To which I add only, “dammit.” Next, head over to Jim Carlson’s entry describing the work he did to support non-root builds. Jim’s entry demonstrates how difficult it is to radically change the build process – and how he managed to pull it off. Finally, if you want to really let your makefile flag fly, check out Mark Nelson’s entry describing the build support for localized messages.

In terms of workspace management, you’ll want to check out Will Fiveash’s entry describing our workspace management tool, wx. For a long time, wx was a shell script in Bonwick’s home directory. It was incredibly useful, but it was also easy to accidentally blow your brains out. (As Bart is fond of saying, it was “all blade and no handle.”) Will’s rewrite made for a much more safer, much more sophisticated wx – and it was a huge help to us in automating the final approach of the DTrace integration.
Debuggability. If you read just a couple of the Opening Day entries, you probably noticed a trend: many of the entries were about finding some nasty bug in the system. This is an accurate reflection of our ethos in developing Solaris: the operating system must be reliable above all else, and we view debugging the operating system as our primary responsibility. This responsibility runs deeper than just the act of debugging, because our needs so outstripped existing tools that we designed and built our own – most notably mdb and DTrace. Fortunately, we ship these tools to you, so you can use them on your own system and on your own applications.

There are many entries describing these tools and how they were used to tackle a problem. Fittingly, a good place to start is Mike Shapiro’s entry describing using mdb to debug a sendmail bug. This bug is described in 4278156, which has one of the greatest bug synopses of all time: “sendmail died in a two SIGALRM fire.” For more on the power of mdb, take a look at Eric Saxe’s entry on using mdb to debug a scheduling problem, Ashish Mehta’s entry on using mdb to debug a race condition, and Eric Kustarz’s entry demonstrating an mdb debugger command (“dcmd”) that he wrote to retrieve NFSv4 recovery messages postmortem. This last example is a particularly good one because this is exactly the kind of custom debugging infrastructure that mdb’s modular architecture makes easy to build. For a comprehensive example of how we have developed subsystem-specific debugging infrastructure, read Sasha Kolbasov’s entry on the mdb dcmds related to STREAMS. As Sasha mentions, the place to start for learning to write your own modules is the documentation – but you can get a flavor for it by reading Yu Xiangning’s entry on writing a writing a module for kmdb. kmdb is the in-situ kernel debugger that implements mdb, and when you need it, nothing else will do – as Dan Mick describes in his entry on debugging with kmdb and moddebug. For more details on kmdb itself, check out Matt Simmons’ entry on kmdb’s design and implementation. To see how mdb can help debug your application, take a look at Will Fiveash’s notes on using debugging application memory problems. Will mentions ::findleaks, a debugger command that I originally implemented for kernel crash dumps, and that Jonathan Adams subsequently ported to work on application core files and – as he mentions in his entry – reworking it substantially in the process.

While mdb is the acme of postmortem debugging, if the manifestation of a bug is _non-_fatal, it’s often more effective to use DTrace to debug it. For an exanple of this, look at Bart Smaalders’ entry on using DTrace to debug jitter. It was gratifying to see Bart debug this problem using DTrace, because latency bubbles were actually one of the motivating pathologies behind DTrace. And finally, debuggability doesn’t end with tools; subsystems must be designed with debuggability in mind, as Stephen Hahn describes in his entry on designing libuutil for debuggability.

I think that about does it for today. As someone pointed out on Liane’s blog, we need a Wiki for this; we agree – it’s on the list of planned enhancements for opensolaris.org. Until then, stay tuned for more sifting…