Sifting through the blogs... | The Observation Deck

Yesterday was Opening Day for OpenSolaris, and we welcomed OpenSolaris with hundreds of blog entries describing various aspects of the implementation. The breadth and depth of our blogging will hopefully put to rest any notion that open sourcing Solaris isn’t a grass-roots effort: if nothing else, it should be clear that we in the trenches are very excited to finally be able to talk about the system that we have poured so much of our lives into – and to welcome new would-be contributors into the fold.

In our excitement, we may have overwhelmed a tad: there was so much content yesterday, that it would have been impossible for anyone to keep up – we blogged over 200,000 words (over 800 pages!) yesterday alone. So over the next few days, I want to highlight some entries that you might have missed, broken down by subject area. In no particular order…

Fault management. Fault management in Solaris 10 has been completely revolutionized by the new predictive self-healing feature pioneered by my longtime co-conspirator Mike Shapiro. There are two must-read entries in this area: Andy Rudoff’s entry providing a predictive self-healing overview, and Dilpreet Bindra’s entry going into more depth on PCI error handling. (If for nothing else, read Dilpreet’s entry for his Reading of the Vows between OpenSolaris and the Community.)
Virtual memory. The virtual memory system is core to any modern operating system, and there are several interesting entries here. Start with Eric Lowe’s extensive entry describing page fault handling. As Eric rightly points out, page fault handling is the epicenter of the VM system; one can learn a tremendous amount about the system just by following page fault processing – and Eric is a great guide on this journey. Once you’ve read Eric’s entry, check out Michael Corcoran’s entry on page coalescing, a technique to assure availability of large-sized pages – which are in turn necessary to increase TLB reach. And discussion of page_t’s leads naturally brings you to Rick Mesta entry describing a big performance win by prefetching these structures during boot.

A less-discussed aspect of virtual memory is the virtual memory layout of the kernel itself. To learn about some of the complexities of this, check out Kit Chow’s entry on address space limitations on 32-bit kernels. The limitation that Kit describes is one of the nasty gotchas of running 32-bit x86 in flat mode. As Kit mentions, the best workaround is to run a 64-bit kernel – but if you’re stuck with a 32-bit x86 chip, you’ll want to read Kit’s suggestions carefully. Kit’s entry is a good segue to Prakash Sangappa’s entry describing his work on dynamic segkp for 32-bit x86 systems. Prakash’s work was critical for getting some more breathing space on 32-bit x86 systems – saving hundreds of megabytes of precious VA. Of course, the ultimate breathing space is that afforded by 64 bits of VA – and in this vein check out Nils Nieuwejaar’s entry on the kernel address space layout on x64. Both Prakash and Nils quote one of those comments in the kernel source code that you really need to know about if you’re going to do serious kernel development: the comment describing the address space layout in i86pc/os/startup.c and sun4/os/startup.c. This comment is one of the canonical ASCII-art comments (more on these eventually), and I usually find these comments in startup.c by searching forward for “----”.
Linking and Loading. One of the most polished subsystems in Solaris is the linker and loader – the craftsmanship of the engineers that have built it has been an ongoing inspiration for many of us in Solaris development. To learn more about the linker, start with Rod Evans’ entry taking you on a source tour of the link-editors, and then head over to Mike Walker’s entry describing library bindings. As long as you’re checking out the linker, be sure to look at past entries like Rod’s entry tracing of a link-edit. As you can imagine, because the dynamic linker is invoked whenever a dynamically-linked binary is executed, it’s a natural place to improve performance – especially with complicated programs like Mozilla or StarOffice that are linked to hundreds (!) of shared objects. We’ve certainly found some big wins in the linker over the years, but we’ve also discovered that it’s difficult to help megaprograms without hurting nanoprograms – and vice versa. For an interesting description of this tradeoff, check out David Safford’s entry on dynamic linker performance. If nothing else, you’ll see from David’s work the research element of operating system development: we often aren’t assured of success when we endeavor to improve the system.
Scheduling. CPU scheduling is one of the most basic properties of a multitasking operating system. Despite being an old problem, we find ourselves constantly improving and extending this subsystem. To learn about CPU scheduling, start with Bill Kucharski’s entry describing the architecture-specific elements of context switching. Then head over to Gavin Maltby’s entry describing the short-term prevention of thread migration. (Before Gavin introduced this facility, the only way to prevent migration was to prevent kernel preemption – an overly blunt mechanism that led to a really nasty latency bubble that I debugged many years ago.)

If you’re going to understand thread dispatching, you’ll need to understand the way thread state is manipulated – and for that you’ll want to look at Saurabh Mishra’s entry describing thread locks. Thread locks are different from normal synchronization primitives, as you can infer from my own entry describing a bug in user-level priority inheritance – which is a good segue to a more general problem when dealing with thread control: how does one change the scheduling properties of a running thread? For an idea of how tricky this can be, check out Andrei Dorofeev’s entry describing binding processes to resource pools. Andrei’s problem was even more challenging than traditional thread manipulation, as he needed to change the scheduling properties of a group of threads atomically. If for no other reason, you should read Andrei’s entry to learn of the “curse of disp.c.” Speaking of the cursed, wrap up your tour of scheduling with Eric Saxe’s entry describing debugging a wedged kernel – you’ll see from Eric’s odyssey that scheduling problems can require a lot of brain-bending (and patience) to debug!

Okay, I think that’s enough for today – and yet it barely scratches the surface! I didn’t even touch on gigantic topics with many Opening Day entries like security, networking, I/O, filesystems, performance, scheduling, service management, observability, etc. etc. Stay tuned – or check out the Opening Day entries for yourself…