June 2005 – The Observation Deck

Still more blog sifting…

June 25, 2005
13 Comments

Even though the launch of
OpenSolaris was well over a week ago,
and even though the
Opening Day entries
have now been sifted through in
five different blog entries
(here,
here,
here
here and
here),
there are still
some great uncategorized entries. Without further ado…

Zones. Zones are among the most talked-about
new features in Solaris, and the engineers on the team have developed
some highly-readable Opening Day entries describing the implementation.
Start with
David Comay’s
entry taking you on a
tour of zone state transitions, and then check out
Dan Price’s
entry to extend your tour to the
innards of the zones console. Dan’s tour is notable because he takes
you by an
ASCII
art comment of his that is one of the more elaborate in
Solaris. (One of these days, I’ll take you on a tour
of my favorite ASCII art comments in the source base — but today we have
too much to see, so everyone back in the car!) Wrap up your tour of
zones with
John Beck’s entry
describing adding command-line editing and history to zonecfg.This entry has applicability beyond zones — it’s a useful how-to for
adding command-line editing and history to any C-based or C++-based application.
Booting. The most exciting project to integrate into Solaris
since Solaris 10
shipped is surely the reachitecture of the booting system to use GRUB —
and
Shudong Zhou’s
entry describing testing the code at Fry’s is a must-read.
For more details,
check out
Jan Setje-Eilers’
entry describing the new boot architecture.
Look for more from these two — using GRUB allows for many new
possibilities, and Shudong and Jan have much to blog about.
Solaris Volume Manager.
Despite the fact that it could save them gobs in licensing fees to
certain third parties, many seem to not realize
that we bundle an
industrial-strength volume manager
with Solaris. Hell-bent on describing what they’ve been working on for
so long, the Solaris Volume Manager team was out in force on Opening Day.
If you have any storage responsibilities in your shop and you’re not
running SVM, you should carefully read these entries; SVM may well save
you a bundle — making you such a hero that you’ll be able to tell the
suits exactly what they can do with their
TPS report.
- Andre Molyneux on RAID 0+1 vs. RAID 1+0 and again on
  the role of timing in testing RAID 5 in SVM.
- Jerry Jelinek on
  improving SVM response to disk failure .
- Sanjay Nadkarni on
  resync regions and optimized resyncs — a feature which
  VxVM users will recognize as the SVM equivalent of dirty region logging (DRL).
- Susan Kamm-Worrell on
  multi-owner disksets, an SVM feature that allows multiple nodes
  to simultaneously access volumes.
- Steve Peng on
  disk Relocation in SVM.
- Tony Nguyen
  on the SVM default interlace and resync buffer values.
Miscellany. There are a handful of great entries that don’t fit neatly
into any one category — or are so specific as to be their own
category. Be sure not to miss these:
- Jeff Bonwick on
  revealing the
  origins of the slab allocator.
- Phil Harman
  on getting getenv to scale. Because so much Solaris
  scalability work happened many years ago, Phil’s is the only entry
  describing the specifics of getting a subsystem to scale with CPUs;
  Phil’s entry should be considered a must-read for anyone working on
  scalability.
- Peter Memishian
  on
  using doors as a synchronization primitive. Peter’s work is
  a good example of why — contrary to the
  beliefs
  of some pinheads — developing a user-level service provider can be
  quite a bit more challenging than developing in the kernel.
- Tim Marsland continuing
  his magnum opus on the implementation of Solaris 10 on x64 with
  Part 4: Userland.
- Ienup Sung
  on efficiently handling illegal UTF-8 byte sequences.
- Chandan describing
  the implementation of the OpenSolaris Source Browser
- Cyndi Eastham
  on developing libavl.
- Chris Beal
  describing the implementation of signal delivery.
- Dave Miner
  leading a tour of the DHCP server.
- Dave Powell on
  the
  implementation of pseudo-filesystems — and why /system is the new home for all such filesystems.
- Jonathan Adams on
  macros and powers of two — and the simple pleasures of
  bit-twiddling.
- Tom Erickson
  on the implementation of libdtrace. libdtrace is still a private
  interface (we still have some sanding and polishing to do), but Tom’s
  entry is a must-read
  for anyone considering writing their own DTrace consumer.
- Keith M Wesolowski
  on getting SPARC inlines to work with GCC.

Phew! I think that about does it. When we first tried to encourage
Solaris engineers to blog on Opening Day, I thought we were going to have
a hard time convincing engineers to blog — I knew that providing in-depth,
technical
content takes a lot of time, and I knew that everyone had other priorities.
So when we were planning the launch and talking
about the possibility of dealing with a massive amount of Opening Day content,
my response was “hurt me
with that problem.” Well, as it turns out, most engineers didn’t need
much convincing — many provided rich, deep content — and I was
indeed hurt with that problem! While it was time consuming to sift
through them, hopefully
you’ve enjoyed reading these entries as much as I have. And let it be
said once more: welcome to OpenSolaris!

Technorati tags:
OpenSolaris
Solaris
DTrace

Yet more blog sifting…

June 20, 2005
No Comments

Despite there being now four blog entries to sift through the
Opening Day entries
(one from me,
one from Liane,
one from Claire
and another from me),
there are still some great entries that have gone uncategorized. Many
of these entries fall into the category of “debugging war stories” —
engineers
describing a particularly tough or interesting bug that they nailed.
The
proliferation of these kinds of stories among the Opening Day entries
is revealing: in
some organizations,
you’re a hero among engineers if you come up with a whizzy new app
(even if the implementation is a cesspool),
while in
others you might gain respect among engineers by getting a
big performance win
(even if it makes things a bit flakey),
but in Solaris
development, nothing gains the respect of peers like debugging some
diabolical bug that has plagued the operating system for years.
So if you’re new to
OpenSolaris
and you’re looking to make a name for yourself, a good place to start is
in the never-ending search for elusive bugs.
To get a flavor for the thrill of the hunt, check out:

George Shepherd on
using DTrace to debug a nasty STREAMS bug.
Jim Carlson
on
debugging a hang in ldtermclose.
You can infer how much
we obsess about debugging from the first line of Jim’s entry: “Every
once in a while, a bug sticks with you long enough that you remember the
ID number without having to think about it.”
Adam Leventhal
on debugging a cross call hang.
Chris Gerhard on a
two line fix in init.
Sarah Jelinek
on
debugging a UFS file truncation bug. For me personally,
it’s always very gratifying to see a bug like Sarah’s — where
DTrace
was guided by expert hands to help out on a tough problem.
Sherry Moore on
debugging a bug at the murky interface between compiler and
operating system.
Saurabh Mishra on debugging a memory ordering problem.
Alok Aggarwal
on debugging an NFSv4 problem on SPARC.
Narayana Kadoor
on debugging a logic error in dynamic intimate
shared memory (DISM).
Surya Prakki
on
debugging kernel memory corruption, surely the most
pernicious of software pathologies.
Raja Gopal Andra
on debugging an application bug.
Martin Englund
on tracking down a bug in the audit daemon.
Peter Harvey on
increasing UNIX group membership. In this entry, Peter
doesn’t dwell on the bug — which is clear in this case — but rather
the complexities of fixing it. It’s a good example of how something that
appears simple can be stubbornly complicated.
Prabahar Jeyaram on
debugging a nasty panic in the UFS lockfs protocol.

And here are a few more on the simple (but thorough) satisfaction from
fixing old bugs:

Paul Roberts on
an ancient bug in xargs.
Stacey Marshall
on a four year old bug in the name service switch. Stacey’s entry touches
on the larger satisfaction of going from a bug in a strange subsystem
to understanding the code, developing the right fix, verifying the fix
and then writing the test case to be sure. The dedication to this kind
of craftsmanship — taking the time to do it the Right Way, even for small
stuff —
is what has drawn many of us to Solaris; it is, as Stacey says, “what I
love about this job.”
Darren Moffat on a nine year old bug in usermod.

That about does it for today. There are still many more entries to
categorize, but fortunately, I think I can see the light at the end of the
tunnel — or is that just the approach of death from the exhaustion of
sifting through
all of
this content?

Technorati tags:
OpenSolaris
Solaris
DTrace

More blog sifting

June 17, 2005
2 Comments

If you didn’t see it,
Liane Praza
picked up where
my sifting
left off, adding
a blog entry pointing to
more Opening Day entries — this
time
in the categories of
devices and device configuration, security, networking,
and standards. But there are still a ton of entries to
categorize, so picking up again in no particular order…

System calls.
System calls are the among most fundamental mechanisms in operating systems:
they are the mechanism by which untrusted, unprivileged software requests
a service of trusted, privileged software. We are lucky to have two
great entries describing the architecture-specific mechanisms of
system calls in Solaris:
check out
Russ Blaine’s entry
on
system calls on x86, and
Gavin Maltby’s
entry on
system
calls on SPARC. Then, to understand the architectural-neutral aspects of
system calls, head over to
Eric Schrock‘s
entry on
how to add a system call.

As a quick aside, that
last entry is a great example of how we in Solaris Kernel Development
are using blogs to write
down information that (believe it or not) has just been an unspoked part
of the craft before now. As
Tim Bray observed,
blogs have become a critical conduit of information for us — we believe
that they are the most scalable way to get information from the
people who have it to the people who need it. If (when?) you become
an OpenSolaris developer,
you can expect some friendly peer pressure to create a blog and
join the party.
Build process and workspace management.
We pride ourselves on a seamless build process,
and a couple of entries have gone into various aspects of this in depth.
To give you an idea of how seriously we take the build process — and
why — check out
Scott Rotondo’s
entry on using lint to find security vulnerabilities.
In particular, note what Scott says when he added a new lint option that
generated
500 new warnings: “I needed to fix all of these before integrating
my change to
Makefile.master because we require the Solaris source to be
lint-clean.” To which I add only, “dammit.”
Next, head over to
Jim Carlson’s
entry describing the work he did to support
non-root builds. Jim’s entry demonstrates how difficult it is to
radically change the build process — and how he managed to pull it off.
Finally, if you want to really let your makefile flag fly,
check out
Mark Nelson’s
entry describing the build support for localized messages.

In terms of workspace management, you’ll want to check out
Will Fiveash’s
entry describing our workspace management tool, wx. For a long
time, wx
was a shell script in
Bonwick’s home directory.
It was incredibly useful, but it was also easy to accidentally blow your
brains out.
(As
Bart is fond of saying, it
was “all blade and no handle.”) Will’s rewrite made for a much more
safer, much more sophisticated wx — and it was a huge help to
us in automating the final approach of the
DTrace integration.
Debuggability. If you read just a couple of the
Opening Day entries,
you probably noticed a trend: many of the entries were about finding
some nasty bug in the system.
This is an accurate reflection of our ethos in developing Solaris:
the operating system must be reliable above all else, and we view
debugging the operating system as our primary responsibility.
This responsibility runs deeper than just the act of debugging, because
our needs so outstripped existing tools that

we designed and built
our own — most notably
mdb
and DTrace.
Fortunately, we ship these tools to you, so you can use them on your
own system and on your own applications.

There are many entries describing these tools and how they were used
to tackle a problem.
Fittingly, a good place to start is
Mike Shapiro’s
entry describing using mdb to debug a sendmail bug. This bug is
described in
4278156,
which has one of the
greatest bug synopses of all time: “sendmail died in a two SIGALRM fire.”
¹
For more on the power of mdb,
take a look at
Eric Saxe’s
entry on
using mdb to debug a scheduling problem,
Ashish Mehta’s entry
on
using
mdb to debug
a race condition, and
Eric Kustarz’s entry demonstrating an mdb debugger command (“dcmd”) that he wrote to
retrieve NFSv4 recovery messages postmortem.
This last example is a particularly good one
because this is exactly the kind of custom debugging
infrastructure that mdb’s modular architecture makes easy to build.
For a comprehensive example of how we have developed subsystem-specific
debugging infrastructure, read
Sasha Kolbasov’s
entry on the
mdb
dcmds related to STREAMS.
As Sasha mentions, the place to start for learning to write your
own modules is the
documentation —
but you can get a flavor for it by reading
Yu Xiangning’s
entry on writing a
writing
a module for kmdb.
kmdb is the in-situ kernel debugger that implements mdb, and when you
need it, nothing else will do — as
Dan Mick describes
in his entry on debugging with kmdb and moddebug.
For more details on kmdb itself,
check out
Matt Simmons’
entry on
kmdb’s design and implementation.
To see how mdb can help debug your application, take a look at
Will Fiveash’s notes
on using debugging application memory problems. Will
mentions ::findleaks, a debugger command that I originally
implemented for kernel crash dumps, and that
Jonathan Adams
subsequently
ported to work on application core files and — as he mentions in
his entry —
reworking it substantially in the process.

While mdb is the acme of postmortem debugging,
if the manifestation of a bug is non-fatal, it’s often more
effective
to use DTrace to debug it.
For an exanple of this,
look at
Bart Smaalders’
entry on using DTrace to debug jitter.
It was gratifying to see Bart debug this problem using DTrace, because
latency bubbles were actually one of the motivating pathologies behind
DTrace.
And finally, debuggability doesn’t end with tools; subsystems must be
designed with
debuggability in mind, as
Stephen Hahn
describes in his entry on
designing libuutil for debuggability.

I think that about does it for today. As someone pointed out on Liane’s
blog, we need a Wiki for this; we agree — it’s on the list of planned
enhancements for
opensolaris.org. Until then,
stay tuned for more sifting…

Technorati tags:
OpenSolaris
Solaris
DTrace
mdb

DTrace and OpenSolaris at BayLISA

June 16, 2005
10 Comments

For those of you in the Bay Area, I will be giving a talk and demo on DTrace at tonight’s
BayLISA meeting in Cupertino.
Starts at 7:30p, here are the directions. I will be demo’ing DTrace tonight on my laptop running OpenSolaris bits that I downloaded and built myself from outside of Sun’s internal network. So if you’re interested in DTrace, OpenSolaris or both (and you’re in the Bay Area), you might want to check it out…

Technorati tags:
OpenSolaris
Solaris
DTrace

Sifting through the blogs…

June 15, 2005
No Comments

Yesterday was Opening Day for
OpenSolaris,
and
we welcomed OpenSolaris with
hundreds
of blog entries
describing
various aspects of
the implementation.
The breadth and depth of our blogging
will hopefully
put to rest any notion that open sourcing Solaris isn’t a grass-roots
effort: if nothing else, it should be clear that we in the trenches
are very excited to finally be able to talk about the system
that we have poured so much of our lives into — and to welcome
new would-be contributors into the fold.

In our excitement, we may have overwhelmed a tad:
there was so much content yesterday, that it would have been impossible
for anyone to keep up — we blogged over 200,000 words (over 800 pages!)
yesterday alone.
So over the next few days, I want to highlight some entries that you
might have missed, broken down by subject area. In no particular order…

Fault management. Fault management in Solaris 10 has been completely
revolutionized by the new predictive self-healing feature pioneered
by my longtime co-conspirator
Mike Shapiro. There are
two must-read entries in this area:
Andy Rudoff’s entry
providing a
predictive self-healing overview, and
Dilpreet Bindra‘s
entry going into more depth on PCI error handling. (If for nothing
else,
read Dilpreet’s entry for his Reading of the Vows between OpenSolaris and the
Community.)
Virtual memory.
The virtual memory system is core to any modern operating system, and
there are several interesting entries here.
Start with
Eric Lowe‘s
extensive entry
describing page fault handling. As Eric rightly points out,
page fault handling is the epicenter of the VM system; one can learn a
tremendous amount about the system just by following page fault processing —
and Eric is a great guide on this journey.
Once you’ve read Eric’s entry,
check out Michael Corcoran‘s
entry on page coalescing,
a technique to assure availability of
large-sized pages — which are in turn necessary to increase TLB reach.
And discussion of page_t‘s leads naturally
brings you to
Rick Mesta
entry describing a
big performance win by
prefetching these structures during boot.

A less-discussed aspect of virtual memory is the virtual memory layout
of the kernel itself. To learn about some of the complexities of this,
check out
Kit Chow’s entry
on address space limitations on 32-bit kernels.
The limitation that Kit describes is one of the nasty gotchas of running
32-bit x86 in flat mode. As Kit mentions, the best workaround is to run
a 64-bit kernel — but if you’re stuck with a 32-bit x86 chip, you’ll want
to read Kit’s suggestions carefully. Kit’s entry is a good segue to
Prakash Sangappa’s
entry describing his work on
dynamic segkp for 32-bit x86 systems. Prakash’s work was critical
for getting some more breathing space on 32-bit x86 systems — saving hundreds
of megabytes of precious VA. Of course, the ultimate breathing space is
that afforded by 64 bits of VA — and in this vein check out
Nils Nieuwejaar‘s
entry on the kernel address space layout on x64. Both
Prakash and Nils
quote one of those comments in the kernel source code that you really need to
know about if you’re going to do serious kernel development: the comment
describing the address space layout in
i86pc/os/startup.c and
sun4/os/startup.c.
This comment is one of the canonical ASCII-art comments (more on these
eventually), and I usually find these comments in startup.c by
searching forward for “----“.
Linking and Loading. One of the most polished subsystems in Solaris
is the linker and loader — the craftsmanship of the engineers that have
built it has been an ongoing inspiration for many of us in Solaris
development. To learn more about the linker,
start with
Rod Evans’ entry
taking you on
a source tour of the link-editors, and then head over to
Mike Walker’s
entry describing library bindings.
As long as you’re checking out
the linker, be sure to look at past entries like
Rod’s entry
tracing of a
link-edit.
As you can imagine, because the
dynamic linker is invoked whenever a dynamically-linked binary is executed,
it’s a natural place to improve performance — especially with
complicated programs like Mozilla or StarOffice that are linked
to hundreds (!) of shared objects. We’ve certainly found some big wins
in the linker over the years, but we’ve also discovered that it’s difficult
to help megaprograms without hurting nanoprograms — and vice versa.
For an interesting description of this tradeoff, check out
David Safford’s
entry on dynamic
linker performance. If nothing else, you’ll see from David’s work
the research element of operating system development: we often aren’t
assured of success when we endeavor to improve the system.
Scheduling. CPU scheduling is one of the most basic properties
of a multitasking operating system. Despite being an old problem,
we find ourselves constantly improving and extending this subsystem.
To learn about CPU scheduling, start with
Bill Kucharski’s
entry describing
the
architecture-specific elements of context switching. Then head
over to
Gavin Maltby’s
entry describing
the
short-term prevention of thread migration. (Before Gavin introduced
this facility, the only way to prevent migration was to prevent kernel
preemption — an overly blunt mechanism that led to
a
really nasty latency bubble that I debugged many years ago.)

If you’re going to understand thread dispatching, you’ll need to understand
the way thread state is manipulated — and for that you’ll want to look at
Saurabh Mishra’s
entry describing
thread
locks. Thread locks are different from normal synchronization primitives,
as you can infer from
my own entry describing
a
bug in user-level
priority inheritance —
which is a good segue to a more general problem when dealing with
thread control: how does one change the scheduling properties of a
running thread?
For an idea of how tricky this can be,
check out
Andrei Dorofeev’s
entry describing
binding
processes to resource pools.
Andrei’s problem was even more challenging than traditional thread
manipulation, as he needed to
change the scheduling properties of a group of threads atomically.
If for no other reason,
you should read Andrei’s entry to learn of the
“curse of disp.c.”
Speaking of the cursed, wrap up your tour of scheduling
with
Eric Saxe’s entry describing
debugging
a wedged kernel — you’ll see from Eric’s odyssey
that scheduling problems can require a lot of brain-bending (and patience) to debug!

Okay, I think that’s enough for today — and yet it
barely scratches the surface! I didn’t even touch on gigantic
topics with many Opening Day entries
like security, networking, I/O, filesystems, performance, scheduling,
service management, observability, etc. etc. Stay tuned — or check out
the
Opening Day entries for yourself…

Technorati tags:
OpenSolaris
Solaris

Month: June 2005

Recent Posts

Archives

Archives