Debuggability. If you read just a couple of the
Opening Day entries,
you probably noticed a trend: many of the entries were about finding
some nasty bug in the system.
This is an accurate reflection of our ethos in developing Solaris:
the operating system must be reliable above all else, and we view
debugging the operating system as our primary responsibility.
This responsibility runs deeper than just the act of debugging, because
our needs so outstripped existing tools that
we designed and built
our own — most notably
mdb
and DTrace.
Fortunately, we ship these tools to you, so you can use them on your
own system and on your own applications.
There are many entries describing these tools and how they were used
to tackle a problem.
Fittingly, a good place to start is
Mike Shapiro’s
entry describing using mdb to debug a sendmail bug. This bug is
described in
4278156,
which has one of the
greatest bug synopses of all time: “sendmail died in a two SIGALRM fire.”
1
For more on the power of mdb,
take a look at
Eric Saxe’s
entry on
using mdb to debug a scheduling problem,
Ashish Mehta’s entry
on
using
mdb to debug
a race condition, and
Eric Kustarz’s entry demonstrating an mdb debugger command (“dcmd”) that he wrote to
retrieve NFSv4 recovery messages postmortem.
This last example is a particularly good one
because this is exactly the kind of custom debugging
infrastructure that mdb’s modular architecture makes easy to build.
For a comprehensive example of how we have developed subsystem-specific
debugging infrastructure, read
Sasha Kolbasov’s
entry on the
mdb
dcmds related to STREAMS.
As Sasha mentions, the place to start for learning to write your
own modules is the
documentation —
but you can get a flavor for it by reading
Yu Xiangning’s
entry on writing a
writing
a module for kmdb.
kmdb is the in-situ kernel debugger that implements mdb, and when you
need it, nothing else will do — as
Dan Mick describes
in his entry on debugging with kmdb and moddebug.
For more details on kmdb itself,
check out
Matt Simmons’
entry on
kmdb’s design and implementation.
To see how mdb can help debug your application, take a look at
Will Fiveash’s notes
on using debugging application memory problems. Will
mentions ::findleaks, a debugger command that I originally
implemented for kernel crash dumps, and that
Jonathan Adams
subsequently
ported to work on application core files and — as he mentions in
his entry —
reworking it substantially in the process.
While mdb is the acme of postmortem debugging,
if the manifestation of a bug is non-fatal, it’s often more
effective
to use DTrace to debug it.
For an exanple of this,
look at
Bart Smaalders’
entry on using DTrace to debug jitter.
It was gratifying to see Bart debug this problem using DTrace, because
latency bubbles were actually one of the motivating pathologies behind
DTrace.
And finally, debuggability doesn’t end with tools; subsystems must be
designed with
debuggability in mind, as
Stephen Hahn
describes in his entry on
designing libuutil for debuggability.