Search
Close this search box.

Month: September 2018

My blog post on falling in love with Rust got quite a bit of attention — with many being surprised by what had surprised me as well: the high performance of my naive Rust versus my (putatively less naive?) C. However, others viewed it as irresponsible to report these performance differences, believing that these results would be blown out of proportion or worse. The concern is not entirely misplaced: system benchmarking is one of those areas where — in Jonathan Swift’s words from three centuries ago — “falsehood flies, and the truth comes limping after it.”

There are myriad reasons why benchmarking is so vulnerable to leaving truth behind. First, it’s deceptively hard to quantify the performance of a system simply because the results are so difficult to verify: the numbers we get must be validated (or rejected) according to the degree that they comport with our expectations. As a result, if our expectations are incorrect, the results can be wildly wrong. To see this vividly, please watch (or rewatch!) Brendan Gregg’s excellent (and hilarious) lightning talk on benchmarking gone wrong. Brendan recounts his experience dealing with a particularly flawed approach, and it’s a talk that I always show anyone who is endeavoring to benchmark the system: it shows how easy it is to get it totally wrong — and how important it is to rigorously validate results.

Second, even if one gets an entirely correct result, it’s really only correct within the context of the system. As we succumb to the temptation of applying a result more universally than this context merits — as the asterisks and the qualifiers on a performance number are quietly amputated — a staid truth is transmogrified into a flying falsehood. Worse, some of that context may have been implicit in that the wrong thing may have been benchmarked: in trying to benchmark one aspect of the system, one may inadvertently merely quantify an otherwise hidden bottleneck.

So take all of this as disclaimer: I am not trying to draw large conclusions about “C vs. Rust” here. To the contrary, I think that it is a reasonable assumption that, for any task, a lower-level language can always be made to outperform a higher-level one. But with that said, a pesky fact remains: I reimplemented a body of C software in Rust, and it performed better for the same task; what’s going on? And is there anything broader we can say about these results?

To explore this, I ran some statemap rendering tests on SmartOS on a single-socket Haswell server (Xeon E3-1270 v3) running at 3.50GHz. The C version was compiled with GCC 7.3.0 with -O2 level optimizations; the Rust version was compiled with 1.29.0 with --release. All of the tests were run bound to a processor set containing a single core; all were bound to one logical CPU within that core, with the other logical CPU forced to be idle. cpustat was used to gather CPU performance counter data, with one number denoting one run with pic0 programmed to that CPU performance counter. The input file (~30MB compressed) contains 3.5M state changes, and in the default config will generate a ~6MB SVG.

Here are the results for a subset of the counters relating to the cache performance:

Counter statemap-gcc statemap-rust
cpu_clk_unhalted.thread_p 32,166,437,125 23,127,271,226 -28.1%
inst_retired.any_p 49,110,875,829 48,752,136,699 -0.7%
cpu_clk_unhalted.ref_p 918,870,673 660,493,684 -28.1%
mem_uops_retired.stlb_miss_loads 8,651,386 2,353,178 -72.8%
mem_uops_retired.stlb_miss_stores 268,802 1,000,684 272.3%
mem_uops_retired.lock_loads 7,791,528 51,737 -99.3%
mem_uops_retired.split_loads 107,969 52,745,125 48752.1%
mem_uops_retired.split_stores 196,934 41,814,301 21132.6%
mem_uops_retired.all_loads 11,977,544,999 9,035,048,050 -24.6%
mem_uops_retired.all_stores 3,911,589,945 6,627,038,769 69.4%
mem_load_uops_retired.l1_hit 9,337,365,435 8,756,546,174 -6.2%
mem_load_uops_retired.l2_hit 1,205,703,362 70,967,580 -94.1%
mem_load_uops_retired.l3_hit 66,771,301 33,323,740 -50.1%
mem_load_uops_retired.l1_miss 1,276,311,911 105,524,579 -91.7%
mem_load_uops_retired.l2_miss 69,671,774 34,616,966 -50.3%
mem_load_uops_retired.l3_miss 2,544,750 1,364,435 -46.4%
mem_load_uops_retired.hit_lfb 1,393,631,815 157,897,686 -88.7%
mem_load_uops_l3_hit_retired.xsnp_miss 435 526 20.9%
mem_load_uops_l3_hit_retired.xsnp_hit 1,269 740 -41.7%
mem_load_uops_l3_hit_retired.xsnp_hitm 820 517 -37.0%
mem_load_uops_l3_hit_retired.xsnp_none 67,846,758 33,376,449 -50.8%
mem_load_uops_l3_miss_retired.local_dram 2,543,699 1,301,381 -48.8%

 

So the Rust version is issuing a remarkably similar number of instructions (within less than one percent!), but with a decidedly different mix: just three quarters of the loads of the C version and (interestingly) many more stores. The cycles per instruction (CPI) drops from 0.65 to 0.47, indicating much better memory behavior — and indeed the L1 misses, L2 misses and L3 misses are all way down. The L1 hits as an absolute number are actually quite high relative to the loads, giving Rust a 96.9% L1 hit rate versus the C version’s 77.9% hit rate. Rust also lives much better in the L2, where it has half the L2 misses of the C version.

Okay, so Rust has better memory behavior than C? Well, not so fast. In terms of what this thing is actually doing: the core of statemap processing is coalescing a large number of state transitions in the raw data into a smaller number of rectangles for the resulting SVG. When presented with a new state transition, it picks the “best” two adjacent rectangles to coalesce based on a variety of properties. As a result, this code spends all of its time constantly updating an efficient data structure to be able to make this decision. For the C version, this is a binary search tree (an AVL tree), but Rust (interestingly) doesn’t offer a binary search tree — and it is instead implemented with a BTreeSet, which implements a B-tree. B-trees are common when dealing with on-disk state, where the cost of loading a node contained in a disk block is much, much less than the cost of searching that node for a desired datum, but they are less common as a replacement for an in-memory BST. Rust makes the (compelling) argument that, given the modern memory hierarchy, the cost of getting a line from memory is far greater than the cost of reading it out of a cache — and B-trees make sense as a replacement for BSTs, albeit with a much smaller value for B. (Cache lines are somewhere between 64 and 512 bytes; disk blocks start at 512 bytes and can be much larger.)

Could the performance difference that we’re seeing simply be Rust’s data structure being — per its design goals — more cache efficient? To explore this a little, I varied the value of the number of rectangles in the statemap, as this will affect both the size of the tree (more rectangles will be a larger tree, leading to a bigger working set) and the number of deletions (more rectangles will result in fewer deletions, leading to less compute time).

The results were pretty interesting:

A couple of things to note here: first, there are 3.5M state transitions in the input data; as soon as the number of rectangles exceeds the number of states, there is no reason for any coalescing, and some operations (namely, deleting from the tree of rectangles) go away entirely. So that explains the flatline at roughly 3.5M rectangles.

Also not surprisingly, the worst performance for both approaches occurs when the number of rectangles is set at more or less half the number of state transitions: the tree is huge (and therefore has relatively poorer cache performance for either approach) and each new state requires a deletion (so the computational cost is also high).

So far, this seems consistent with the BTreeSet simply being a more efficient data structure. But what is up with that lumpy Rust performance?! In particular there are some strange spikes; e.g., zooming in on the rectangle range up to 100,000 rectangles:

Just from eyeballing it, they seem to appear at roughly logarithmic frequency with respect to the number of rectangles. My first thought was perhaps some strange interference relationship with respect to the B-tree and the cache size or stride, but this is definitely a domain where an ounce of data is worth much more than a pound of hypotheses!

Fortunately, because Rust is static (and we have things like, say, symbols and stack traces!), we can actually just use DTrace to explore this. Take this simple D script, rustprof.d:

#pragma D option quiet

profile-4987hz
/pid == $target && arg1 != 0/
{
        @[usym(arg1)] = count();
}

END
{
        trunc(@, 10);
        printa("%10@d %A\n", @);
}

I ran this against two runs: one at a peak (e.g., 770,000 rectangle) and then another at the adjacent trough (e.g., 840,000 rectangles), demangling the resulting names by sending the the output through rustfilt. Results for 770,000 rectangles:

# dtrace -s ./rustprof.d -c "./statemap --dry-run -c 770000 ./pg-zfs.out" | rustfilt
3943472 records processed, 769999 rectangles
      1043 statemap`<alloc::collections::btree::map::BTreeMap<K, V>>::remove
      1180 statemap`<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::finish
      1208 libc.so.1`memmove
      1253 statemap`<serde_json::read::StrRead<'a> as serde_json::read::Read<'a>>::parse_str
      1320 statemap`<std::collections::hash::map::HashMap<K, V, S>>::remove
      1695 libc.so.1`memcpy
      2558 statemap`statemap::statemap::Statemap::ingest
      4123 statemap`<std::collections::hash::map::HashMap<K, V, S>>::insert
      4503 statemap`<std::collections::hash::map::HashMap<K, V, S>>::get
     26640 statemap`alloc::collections::btree::search::search_tree

And now the same thing, but against the adjacent valley of better performance at 840,000 rectangles:

# dtrace -s ./rustprof.d -c "./statemap --dry-run -c 840000 ./pg-zfs.out" | rustfilt
3943472 records processed, 839999 rectangles
       971 statemap`<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write
      1071 statemap`<alloc::collections::btree::map::BTreeMap<K, V>>::remove
      1158 statemap`<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::finish
      1228 libc.so.1`memmove
      1348 statemap`<serde_json::read::StrRead<'a> as serde_json::read::Read<'a>>::parse_str
      1628 libc.so.1`memcpy
      2524 statemap`statemap::statemap::Statemap::ingest
      2948 statemap`<std::collections::hash::map::HashMap<K, V, S>>::insert
      4125 statemap`<std::collections::hash::map::HashMap<K, V, S>>::get
     26359 statemap`alloc::collections::btree::search::search_tree

The samples in btree::search::search_tree are roughly the same — but the poorly performing one has many more samples in HashMap<K, V, S>::insert (4123 vs. 2948). What is going on? The HashMap implementation in Rust uses Robin Hood hashing and linear probing — which means that hash maps must be resized when they hit a certain load factor. (By default, the hash map load factor is 90.9%.) And note that I am using hash maps to effectively implement a doubly linked list: I will have a number of hash maps that — between them — will contain the specified number of rectangles. Given that we only see this at particular sizes (and given that the distance between peaks increases exponentially with respect to the number of rectangles), it seems entirely plausible that at some numbers of rectangles, the hash maps will grow large enough to induce quite a bit more probing, but not quite large enough to be resized.

To explore this hypothesis, it would be great to vary the hash map load factor, but unfortunately the load factor isn’t currently dynamic. Even then, we could explore this by using with_capacity to preallocate our hash maps, but the statemap code doesn’t necessarily know how much to preallocate because the rectangles themselves are spread across many hash maps.

Another option is to replace our use of HashMap with a different data structure — and in particular, we can use a BTreeMap in its place. If the load factor isn’t the issue (that is, if there is something else going on for which the additional compute time in HashMap<K, V, S>::insert is merely symptomatic), we would expect a BTreeMap-based implementation to have a similar issue at the same points.

With Rust, conducting this experiment is absurdly easy:

diff --git a/src/statemap.rs b/src/statemap.rs
index a44dc73..5b7073d 100644
--- a/src/statemap.rs
+++ b/src/statemap.rs
@@ -109,7 +109,7 @@ struct StatemapEntity {
     last: Option,                      // last start time
     start: Option,                     // current start time
     state: Option,                     // current state
-    rects: HashMap<u64, RefCell>, // rectangles for this entity
+    rects: BTreeMap<u64, RefCell>, // rectangles for this entity
 }

 #[derive(Debug)]
@@ -151,6 +151,7 @@ use std::str;
 use std::error::Error;
 use std::fmt;
 use std::collections::HashMap;
+use std::collections::BTreeMap;
 use std::collections::BTreeSet;
 use std::str::FromStr;
 use std::cell::RefCell;
@@ -306,7 +307,7 @@ impl StatemapEntity {
             description: None,
             last: None,
             state: None,
-            rects: HashMap::new(),
+            rects: BTreeMap::new(),
             id: id,
         }
     }

That’s it: because the two (by convention) have the same interface, there is nothing else that needs to be done! And the results, with the new implementation in light blue:

Our lumps are gone! In general, the BTreeMap-based implementation performs a little worse than the HashMap-based implementation, but without as much variance. Which isn’t to say that this is devoid of strange artifacts! It’s especially interesting to look at the variation at lower levels of rectangles, when the two implementations seem to alternate in the pole position:

I don’t know what happens to the BTreeMap-based implementation at about ~2,350 rectangles (where it degrades by nearly 10% but then recovers when the number of rectangles hits ~2,700 or so), but at this point, the effects are only academic for my purposes: for statemaps, the default number of rectangles is 25,000. That said, I’m sure that digging there would yield interesting discoveries!

So, where does all of this leave us? Certainly, Rust’s foundational data structures perform very well. Indeed, it might be tempting to conclude that, because a significant fraction of the delta here is the difference in data structures (i.e., BST vs. B-tree), the difference in language (i.e., C vs. Rust) doesn’t matter at all.

But that would be overlooking something important: part of the reason that using a BST (and in particular, an AVL tree) was easy for me is because we have an AVL tree implementation built as an intrusive data structure. This is a pattern we use a bunch in C: the data structure is embedded in a larger, containing structure — and it is the caller’s responsibility to allocate, free and lock this structure. That is, implementing a library as an intrusive data structure completely sidesteps both allocation and locking. This allows for an entirely robust arbitrarily embeddable library, and it also makes it really easy for a single data structure to be in many different data structures simultaneously. For example, take ZFS’s zio structure, in which a single contiguous chunk of memory is on (at least) two different lists and three different AVL trees! (And if that leaves you wondering how anything could possibly be so complicated, see George Wilson’s recent talk explaining the ZIO pipeline.)

Implementing a B-tree this way, however, would be a mess. The value of a B-tree is in the contiguity of nodes — that is, it is the allocation that is a core part of the win of the data structure. I’m sure it isn’t impossible to implement an intrusive B-tree in C, but it would require so much more caller cooperation (and therefore a more complicated and more error-prone interface) that I do imagine that it would have you questioning life choices quite a bit along the way. (After all, a B-tree is a win — but it’s a constant-time win.)

Contrast this to Rust: intrusive data structures are possible in Rust, but they are essentially an anti-pattern. Rust really, really wants you to have complete orthogonality of purpose in your software. This leads you to having multiple disjoint data structures with very clear trees of ownership — where before you might have had a single more complicated data structure with graphs of multiple ownership. This clear separation of concerns in turn allows for these implementations to be both broadly used and carefully optimized. For an in-depth example of the artful implementation that Rust allows, see Alexis Beingessner’s excellent blog entry on the BTreeMap implementation.

All of this adds up to the existential win of Rust: powerful abstractions without sacrificing performance. Does this mean that Rust will always outperform C? No, of course not. But it does mean that you shouldn’t be surprised when it does — and that if you care about performance and you are implementing new software, it is probably past time to give Rust a very serious look!

Update: Several have asked if Clang would result in materially better performance; my apologies for not having mentioned that when I did my initial analysis, I had included Clang and knew that (at the default rectangles of 25,000), it improved things a little but not enough to approach the performance of the Rust implementation. But for completeness sake, I cranked the Clang-compiled binary at the same rectangle points:

Despite its improvement over GCC, I don’t think that the Clang results invalidate any of my analysis — but apologies again for not including them in the original post!

Let me preface this with an apology: this is a technology love story, and as such, it’s long, rambling, sentimental and personal. Also befitting a love story, it has a When Harry Met Sally feel to it, in that its origins are inauspicious…

First encounters

Over a decade ago, I worked on a technology to which a competitor paid the highest possible compliment: they tried to implement their own knockoff. Because this was done in the open (and because it is uniquely mesmerizing to watch one’s own work mimicked), I spent way too much time following their mailing list and tracking their progress (and yes, taking an especially shameful delight in their occasional feuds). On their team, there was one technologist who was clearly exceptionally capable — and I confess to being relieved when he chose to leave the team relatively early in the project’s life. This was all in 2005; for years for me, Rust was “that thing that Graydon disappeared to go work on.” From the description as I read it at the time, Graydon’s new project seemed outrageously ambitious — and I assumed that little would ever come of it, though certainly not for lack of ability or effort…

Fast forward eight years to 2013 or so. Impressively, Graydon’s Rust was not only still alive, but it had gathered a community and was getting quite a bit of attention — enough to merit a serious look. There seemed to be some very intriguing ideas, but any budding interest that I might have had frankly withered when I learned that Rust had adopted the M:N threading model — including its more baroque consequences like segmented stacks. In my experience, every system that has adopted the M:N model has lived to regret it — and it was unfortunate to have a promising new system appear to be ignorant of the scarred shoulders that it could otherwise stand upon. For me, the implications were larger than this single decision: I was concerned that it may be indicative of a deeper malaise that would make Rust a poor fit for the infrastructure software that I like to write. So while impressed that Rust’s ambitious vision was coming to any sort of fruition at all, I decided that Rust wasn’t for me personally — and I didn’t think much more about it…

Some time later, a truly amazing thing happened: Rust ripped it out. Rust’s reasoning for removing segmented stacks is a concise but thorough damnation; their rationale for removing M:N is clear-eyed, thoughtful and reflective — but also unequivocal in its resolve. Suddenly, Rust became very interesting: all systems make mistakes, but few muster the courage to rectify them; on that basis alone, Rust became a project worthy of close attention.

So several years later, in 2015, it was with great interest that I learned that Adam started experimenting with Rust. On first read of Adam’s blog entry, I assumed he would end what appeared to be excruciating pain by deleting the Rust compiler from his computer (if not by moving to a commune in Vermont) — but Adam surprised me when he ended up being very positive about Rust, despite his rough experiences. In particular, Adam hailed the important new ideas like the ownership model — and explicitly hoped that his experience would serve as a warning to others to approach the language in a different way.

In the years since, Rust continued to mature and my curiosity (and I daresay, that of many software engineers) has steadily intensified: the more I have discovered, the more intrigued I have become. This interest has coincided with my personal quest to find a programming language for the back half of my career: as I mentioned in my Node Summit 2017 talk on platform as a reflection of values, I have been searching for a language that reflects my personal engineering values around robustness and performance. These values reflect a deeper sense within me: that software can be permanent — that software’s unique duality as both information and machine afford a timeless perfection and utility that stand apart from other human endeavor. In this regard, I have believed (and continue to believe) that we are living in a Golden Age of software, one that will produce artifacts that will endure for generations. Of course, it can be hard to hold such heady thoughts when we seem to be up to our armpits in vendored flotsam, flooded by sloppy abstractions hastily implemented. Among current languages, only Rust seems to share this aspiration for permanence, with a perspective that is decidedly larger than itself.

Taking the plunge

So I have been actively looking for an opportunity to dive into Rust in earnest, and earlier this year, one presented itself: for a while, I have been working on a new mechanism for system visualization that I dubbed statemaps. The software for rendering statemaps needs to inhale a data stream, coalesce it down to a reasonable size, and render it as a dynamic image that can be manipulated by the user. This originally started off as being written in node.js, but performance became a problem (especially for larger data sets) and I did what we at Joyent have done in such situations: I rewrote the hot loop in C, and then dropped that into a node.js add-on (allowing the SVG-rendering code to remain in JavaScript). This was fine, but painful: the C was straightforward, but the glue code to bridge into node.js was every bit as capricious, tedious, and error-prone as it has always been. Given the performance constraint, the desire for the power of a higher level language, and the experimental nature of the software, statemaps made for an excellent candidate to reimplement in Rust; my intensifying curiosity could finally be sated!

As I set out, I had the advantage of having watched (if from afar) many others have their first encounters with Rust. And if those years of being a Rust looky-loo taught me anything, it’s that the early days can be like the first days of snowboarding or windsurfing: lots of painful falling down! So I took deliberate approach with Rust: rather than do what one is wont to do when learning a new language and tinker a program into existence, I really sat down to learn Rust. This is frankly my bias anyway (I always look for the first principles of a creation, as explained by its creators), but with Rust, I went further: not only did I buy the canonical reference (The Rust Programming Language by Steve Klabnik, Carol Nichols and community contributors), I also bought an O’Reilly book with a bit more narrative (Programming Rust by Jim Blandy and Jason Orendorff). And with this latter book, I did something that I haven’t done since cribbing BASIC programs from Enter magazine back in the day: I typed in the example program in the introductory chapters. I found this to be very valuable: it got the fingers and the brain warmed up while still absorbing Rust’s new ideas — and debugging my inevitable transcription errors allowed me to get some understanding of what it was that I was typing. At the end was something that actually did something, and (importantly), by working with a program that was already correct, I was able to painlessly feel some of the tremendous promise of Rust.

Encouraged by these early (if gentle) experiences, I dove into my statemap rewrite. It took a little while (and yes, I had some altercations with the borrow checker!), but I’m almost shocked about how happy I am with the rewrite of statemaps in Rust. Because I know that many are in the shoes I occupied just a short while ago (namely, intensely wondering about Rust, but also wary of its learning curve — and concerned about the investment of time and energy that climbing it will necessitate), I would like to expand on some of the things that I love about Rust other than the ownership model. This isn’t because I don’t love the ownership model (I absolutely do) or that the ownership model isn’t core to Rust (it is rightfully thought of as Rust’s epicenter), but because I think its sheer magnitude sometimes dwarfs other attributes of Rust — attributes that I find very compelling! In a way, I am writing this for my past self — because if I have one regret about Rust, it’s that I didn’t see beyond the ownership model to learn it earlier.

I will discuss these attributes in roughly the order I discovered them with the (obvious?) caveat that this shouldn’t be considered authoritative; I’m still very much new to Rust, and my apologies in advance for any technical details that I get wrong!

1. Rust’s error handling is beautiful

The first thing that really struck me about Rust was its beautiful error handling — but to appreciate why it so resonated with me requires some additional context. Despite its obvious importance, error handling is something we haven’t really gotten right in systems software. For example, as Dave Pacheo observed with respect to node.js, we often conflate different kinds of errors — namely, programmatic errors (i.e., my program is broken because of a logic error) with operational errors (i.e., an error condition external to my program has occurred and it affects my operation). In C, this conflation is unusual, but you see it with the infamous SIGSEGV signal handler that has been known to sneak into more than one undergraduate project moments before a deadline to deal with an otherwise undebuggable condition. In the Java world, this is slightly more common with the (frowned upon) behavior of catching java.lang.NullPointerException or otherwise trying to drive on in light of clearly broken logic. And in the JavaScript world, this conflation is commonplace — and underlies one of the most serious objections to promises.

Beyond the ontological confusion, error handling suffers from an infamous mechanical problem: for a function that may return a value but may also fail, how is the caller to delineate the two conditions? (This is known as the semipredicate problem after a Lisp construct that suffers from it.) C handles this as it handles so many things: by leaving it to the programmer to figure out their own (bad) convention. Some use sentinel values (e.g., Linux system calls cleave the return space in two and use negative values to denote the error condition); some return defined values on success and failure and then set an orthogonal error code; and of course, some just silently eat errors entirely (or even worse).

C++ and Java (and many other languages before them) tried to solve this with the notion of exceptions. I do not like exceptions: for reasons not dissimilar to Dijkstra’s in his famous admonition against “goto”, I consider exceptions harmful. While they are perhaps convenient from a function signature perspective, exceptions allow errors to wait in ambush, deep in the tall grass of implicit dependencies. When the error strikes, higher-level software may well not know what hit it, let alone from whom — and suddenly an operational error has become a programmatic one. (Java tries to mitigate this sneak attack with checked exceptions, but while well-intentioned, they have serious flaws in practice.) In this regard, exceptions are a concrete example of trading the speed of developing software with its long-term operability. One of our deepest, most fundamental problems as a craft is that we have enshrined “velocity” above all else, willfully blinding ourselves to the long-term consequences of gimcrack software. Exceptions optimize for the developer by allowing them to pretend that errors are someone else’s problem — or perhaps that they just won’t happen at all.

Fortunately, exceptions aren’t the only way to solve this, and other languages take other approaches. Closure-heavy languages like JavaScript afford environments like node.js the luxury of passing an error as an argument — but this argument can be ignored or otherwise abused (and it’s untyped regardless), making this solution far from perfect. And Go uses its support for multiple return values to (by convention) return both a result and an error value. While this approach is certainly an improvement over C, it is also noisy, repetitive and error-prone.

By contrast, Rust takes an approach that is unique among systems-oriented languages: leveraging first algebraic data types — whereby a thing can be exactly one of an enumerated list of types and the programmer is required to be explicit about its type to manipulate it — and then combining it with its support for parameterized types. Together, this allows functions to return one thing that’s one of two types: one type that denotes success and one that denotes failure. The caller can then pattern match on the type of what has been returned: if it’s of the success type, it can get at the underlying thing (by unwrapping it), and if it’s of the error type, it can get at the underlying error and either handle it, propagate it, or improve upon it (by adding additional context) and propagating it. What it cannot do (or at least, cannot do implicitly) is simply ignore it: it has to deal with it explicitly, one way or the other. (For all of the details, see Recoverable Errors with Result.)

To make this concrete, in Rust you end up with code that looks like this:

fn do_it(filename: &str) -> Result {
    let stat = match fs::metadata(filename) {
        Ok(result) => { result },
        Err(err) => { return Err(err); }
    };

    let file = match File::open(filename) {
        Ok(result) => { result },
        Err(err) => { return Err(err); }
    };

    /* ... */

    Ok(())
}

Already, this is pretty good: it’s cleaner and more robust than multiple return values, return sentinels and exceptions — in part because the type system helps you get this correct. But it’s also verbose, so Rust takes it one step further by introducing the propagation operator: if your function returns a Result, when you call a function that itself returns a Result, you can append a question mark on the call to the function denoting that upon Ok, the result should be unwrapped and the expression becomes the unwrapped thing — and upon Err the error should be returned (and therefore propagated). This is easier seen than explained! Using the propagation operator turns our above example into this:

fn do_it_better(filename: &str) -> Result {
    let stat = fs::metadata(filename)?;
    let file = File::open(filename)?;

    /* ... */

    Ok(())
}

This, to me, is beautiful: it is robust; it is readable; it is not magic. And it is safe in that the compiler helps us arrive at this and then prevents us from straying from it.

Platforms reflect their values, and I daresay the propagation operator is an embodiment of Rust’s: balancing elegance and expressiveness with robustness and performance. This balance is reflected in a mantra that one hears frequently in the Rust community: “we can have nice things.” Which is to say: while historically some of these values were in tension (i.e., making software more expressive might implicitly be making it less robust or more poorly performing), through innovation Rust is finding solutions that don’t compromise one of these values for the sake of the other.

2. The macros are incredible

When I was first learning C, I was (rightly) warned against using the C preprocessor. But like many of the things that we are cautioned about in our youth, this warning was one that the wise give to the enthusiastic to prevent injury; the truth is far more subtle. And indeed, as I came of age as a C programmer, I not only came to use the preprocessor, but to rely upon it. Yes, it needed to be used carefully — but in the right hands it could generate cleaner, better code. (Indeed, the preprocessor is very core to the way we implement DTrace’s statically defined tracing.) So if anything, my problems with the preprocessor were not its dangers so much as its many limitations: because it is, in fact, a preprocessor and not built into the language, there were all sorts of things that it would never be able to do — like access the abstract syntax tree.

With Rust, I have been delighted by its support for hygienic macros. This not only solves the many safety problems with preprocessor-based macros, it allows them to be outrageously powerful: with access to the AST, macros are afforded an almost limitless expansion of the syntax — but invoked with an indicator (a trailing bang) that makes it clear to the programmer when they are using a macro. For example, one of the fully worked examples in Programming Rust is a json! macro that allows for JSON to be easy declared in Rust. This gets to the ergonomics of Rust, and there are many macros (e.g., format!, vec!, etc.) that make Rust more pleasant to use.

Another advantage of macros: they are so flexible and powerful that they allow for effective experimentation. For example, the propagation operator that I love so much actually started life as a try! macro; that this macro was being used ubiquitously (and successfully) allowed a language-based solution to be considered. Languages can be (and have been!) ruined by too much experimentation happening in the language rather than in how it’s used; through its rich macros, it seems that Rust can enable the core of the language to remain smaller — and to make sure that when it expands, it is for the right reasons and in the right way.

3. format! is a pleasure

Okay, this is a small one but it’s (another) one of those little pleasantries that has made Rust really enjoyable. Many (most? all?) languages have an approximation or equivalent of the venerable sprintf, whereby variable input is formatted according to a format string. Rust’s variant of this is the format! macro (which is in turn invoked by println!, panic!, etc.), and (in keeping with one of the broader themes of Rust) it feels like it has learned from much that came before it. It is type-safe (of course) but it is also clean in that the {} format specifier can be used on any type that implements the Display trait. I also love that the {:?} format specifier denotes that the argument’s Debug trait implementation should be invoked to print debug output. More generally, all of the format specifiers map to particular traits, allowing for an elegant approach to an historically grotty problem. There are a bunch of other niceties, and it’s all a concrete example of how Rust uses macros to deliver nice things without sullying syntax or otherwise special-casing. None of the formatting capabilities are unique to Rust, but that’s the point: in this (small) domain (as in many) Rust feels like a distillation of the best work that came before it. As anyone who has had to endure one of my talks can attest, I believe that appreciating history is essential both to understand our present and to map our future. Rust seems to have that perspective in the best ways: it is reverential of the past without being incarcerated by it.

4. include_str! is a godsend

One of the filthy aspects of the statemap code is that it is effectively encapsulating another program — a JavaScript program that lives in the SVG to allow for the interactivity of the statemap. This code lives in its own file, which the statemap code should pass through to the generated SVG. In the node.js/C hybrid, I am forced to locate the file in the filesystem — which is annoying because it has to be delivered along with the binary and located, etc. Now Rust — like many languages (including ES6) — has support for raw-string literals. As an aside, it’s interesting to see the discussion leading up to its addition, and in particular, how a group of people really looked at every language that does this to see what should be mimicked versus what could be improved upon. I really like the syntax that Rust converged on: r followed by one or more octothorpes followed by a quote to begin a raw string literal, and a quote followed by a matching number of octothorpes followed to end a literal, e.g.:

    let str = r##""What a curious feeling!" said Alice"##;

This alone would have allowed me to do what I want, but still a tad gross in that it’s a bunch of JavaScript living inside a raw literal in a .rs file. Enter include_str!, which allows me to tell the compiler to find the specified file in the filesystem during compilation, and statically drop it into a string variable that I can manipulate:

        ...
        /*
         * Now drop in our in-SVG code.
         */
        let lib = include_str!("statemap-svg.js");
        ...

So nice! Over the years I have wanted this many times over for my C, and it’s another one of those little (but significant!) things that make Rust so refreshing.

5. Serde is stunningly good

Serde is a Rust crate that allows for serialization and deserialization, and it’s just exceptionally good. It uses macros (and, in particular, Rust’s procedural macros) to generate structure-specific routines for serialization and deserialization. As a result, Serde requires remarkably little programmer lift to use and performs eye-wateringly well — a concrete embodiment of Rust’s repeated defiance of the conventional wisdom that programmers must choose between abstractions and performance!

For example, in the statemap implementation, the input is concatenated JSON that begins with a metadata payload. To read this payload in Rust, I define the structure, and denote that I wish to derive the Deserialize trait as implemented by Serde:

#[derive(Deserialize, Debug)]
#[allow(non_snake_case)]
struct StatemapInputMetadata {
    start: Vec<u64>,
    title: String,
    host: Option<String>,
    entityKind: Option<String>,
    states: HashMap<String, StatemapInputState>,
}

Then, to actually parse it:

     let metadata: StatemapInputMetadata = serde_json::from_str(payload)?;

That’s… it. Thanks to the magic of the propagation operator, the errors are properly handled and propagated — and it has handled tedious, error-prone things for me like the optionality of certain members (itself beautifully expressed via Rust’s ubiquitous Option type). With this one line of code, I now (robustly) have a StatemapInputMetadata instance that I can use and operate upon — and this performs incredibly well on top of it all. In this regard, Serde represents the best of software: it is a sophisticated, intricate implementation making available elegant, robust, high-performing abstractions; as legendary White Sox play-by-play announcer Hawk Harrelson might say, MERCY!

6. I love tuples

In my C, I have been known to declare anonymous structures in functions. More generally, in any strongly typed language, there are plenty of times when you don’t want to have to fill out paperwork to be able to structure your data: you just want a tad more structure for a small job. For this, Rust borrows an age-old construct from ML in tuples. Tuples are expressed as a parenthetical list, and they basically work as you expect them to work in that they are static in size and type, and you can index into any member. For example, in some test code that needs to make sure that names for colors are correctly interpreted, I have this:

        let colors = vec![
            ("aliceblue", (240, 248, 255)),
            ("antiquewhite", (250, 235, 215)),
            ("aqua", (0, 255, 255)),
            ("aquamarine", (127, 255, 212)),
            ("azure", (240, 255, 255)),
            /* ... */
        ];

Then colors[2].0 (say) which will be the string “aqua”; (colors[1].1).2 will be the integer 215. Don’t let the absence of a type declaration in the above deceive you: tuples are strongly typed, it’s just that Rust is inferring the type for me. So if I accidentally try to (say) add an element to the above vector that contains a tuple of mismatched signature (e.g., the tuple “((188, 143, 143), ("rosybrown"))“, which has the order reversed), Rust will give me a compile-time error.

The full integration of tuples makes them a joy to use. For example, if a function returns a tuple, you can easily assign its constituent parts to disjoint variables, e.g.:

fn get_coord() -> (u32, u32) {
   (1, 2)
}

fn do_some_work() {
    let (x, y) = get_coord();
    /* x has the value 1, y has the value 2 */
}

Great stuff!

7. The integrated testing is terrific

One of my regrets on DTrace is that we didn’t start on the DTrace test suite at the same time we started the project. And even after we starting building it (too late, but blessedly before we shipped it), it still lived away from the source for several years. And even now, it’s a bit of a pain to run — you really need to know it’s there.

This represents everything that’s wrong with testing in C: because it requires bespoke machinery, too many people don’t bother — even when they know better! Viz.: in the original statemap implementation, there is zero testing code — and not because I don’t believe in it, but just because it was too much work for something relatively small. Yes, there are plenty of testing frameworks for C and C++, but in my experience, the integrated frameworks are too constrictive — and again, not worth it for a smaller project.

With the rise of test-driven development, many languages have taken a more integrated approach to testing. For example, Go has a rightfully lauded testing framework, Python has unittest, etc. Rust takes a highly integrated approach that combines the best of all worlds: test code lives alongside the code that it’s testing — but without having to make the code bend to a heavyweight framework. The workhorses here are conditional compilation and Cargo, which together make it so easy to write tests and run them that I found myself doing true test-driven development with statemaps — namely writing the tests as I develop the code.

8. The community is amazing

In my experience, the best communities are ones that are inclusive in their membership but resolute in their shared values. When communities aren’t inclusive, they stagnate, or rot (or worse); when communities don’t share values, they feud and fracture. This can be a very tricky balance, especially when so many open source projects start out as the work of a single individual: it’s very hard for a community not to reflect the idiosyncrasies of its founder. This is important because in the open source era, community is critical: one is selecting a community as much as one is selecting a technology, as each informs the future of the other. One factor that I value a bit less is strictly size: some of my favorite communities are small ones — and some of my least favorite are huge.

For purposes of a community, Rust has a luxury of clearly articulated, broadly shared values that are featured prominently and reiterated frequently. If you head to the Rust website this is the first sentence you’ll read:

Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.

That gets right to it: it says that as a community, we value performance and robustness — and we believe that we shouldn’t have to choose between these two. (And we have seen that this isn’t mere rhetoric, as so many Rust decisions show that these values are truly the lodestar of the project.)

And with respect to inclusiveness, it is revealing that you will likely read that statement of values in your native tongue, as the Rust web page has been translated into thirteen languages. Just the fact that it has been translated into so many languages makes Rust nearly unique among its peers. But perhaps more interesting is where this globally inclusive view likely finds its roots: among the sites of its peers, only Ruby is similarly localized. Given that several prominent Rustaceans like Steve Klabnik and Carol Nichols came from the Ruby community, it would not be unreasonable to guess that they brought this globally inclusive view with them. This kind of inclusion is one that one sees again and again in the Rust community: different perspectives from different languages and different backgrounds. Those who come to Rust bring with them their experiences — good and bad — from the old country, and the result is a melting pot of ideas. This is an inclusiveness that runs deep: by welcoming such disparate perspectives into a community and then uniting them with shared values and a common purpose, Rust achieves a rich and productive heterogeneity of thought. That is, because the community agrees about the big things (namely, its fundamental values), it has room to constructively disagree (that is, achieve consensus) on the smaller ones.

Which isn’t to say this is easy! Check out Ashley Williams in the opening keynote from RustConf 2018 for how exhausting it can be to hash through these smaller differences in practice. Rust has taken a harder path than the “traditional” BDFL model, but it’s a qualitatively better one — and I believe that many of the things that I love about Rust are a reflection of (and a tribute to) its robust community.

9. The performance rips

Finally, we come to the last thing I discovered in my Rust odyssey — but in many ways, the most important one. As I described in an internal presentation, I had experienced some frustrations trying to implement in Rust the same structure I had had in C. So I mentally gave up on performance, resolving to just get something working first, and then optimize it later.

I did get it working, and was able to benchmark it, but to give some some context for the numbers, here is the time to generate a statemap in the old (slow) pure node.js implementation for a modest trace (229M, ~3.9M state transitions) on my 2.9 GHz Core i7 laptop:

% time ./statemap-js/bin/statemap ./pg-zfs.out > js.svg

real	1m23.092s
user	1m21.106s
sys	0m1.871s

This is bad — and larger input will cause it to just run out of memory. And here’s the version as reimplemented as a C/node.js hybrid:

% time ./statemap-c/bin/statemap ./pg-zfs.out > c.svg

real	0m11.800s
user	0m11.414s
sys	0m0.330s

This was (as designed) a 10X improvement in performance, and represents speed-of-light numbers in that this seems to be an optimal implementation. Because I had written my Rust naively (and my C carefully), my hope was that the Rust would be no more than 20% slower — but I was braced for pretty much anything. Or at least, I thought I was; I was actually genuinely taken aback by the results:

$ time ./statemap.rs/target/release/statemap ./pg-zfs.out > rs.svg
3943472 records processed, 24999 rectangles

real	0m8.072s
user	0m7.828s
sys	0m0.186s

Yes, you read that correctly: my naive Rust was ~32% faster than my carefully implemented C. This blew me away, and in the time since, I have spent some time on a real lab machine running SmartOS (where I have reproduced these results and been able to study them a bit). My findings are going to have to wait for another blog entry, but suffice it to say that despite executing a shockingly similar number of instructions, the Rust implementation has a different load/store mix (it is much more store-heavy than C) — and is much better behaved with respect to the cache. Given the degree that Rust passes by value, this makes some sense, but much more study is merited.

It’s also worth mentioning that there are some easy wins that will make the Rust implementation even faster: after I had publicized the fact that I had a Rust implementation of statemaps working, I was delighted when David Tolnay, one of the authors of Serde, took the time to make some excellent suggestions for improvement. For a newcomer like me, it’s a great feeling to have someone with such deep expertise as David’s take an interest in helping me make my software perform even better — and it is revealing as to the core values of the community.

Rust’s shockingly good performance — and the community’s desire to make it even better — fundamentally changed my disposition towards it: instead of seeing Rust as a language to augment C and replace dynamic languages, I’m looking at it as a language to replace both C and dynamic languages in all but the very lowest layers of the stack. C — like assembly — will continue to have a very important place for me, but it’s hard to not see that place as getting much smaller relative to the barnstorming performance of Rust!

Beyond the first impressions

I wouldn’t want to imply that this is an exhaustive list of everything that I have fallen in love with about Rust. That list is much longer would include at least the ownership model; the trait system; Cargo; the type inference system. And I feel like I have just scratched the surface; I haven’t waded into known strengths of Rust like the FFI and the concurrency model! (Despite having written plenty of multithreaded code in my life, I haven’t so much as created a thread in Rust!)

Building a future

I can say with confidence that my future is in Rust. As I have spent my career doing OS kernel development, a natural question would be: do I intend to rewrite the OS kernel in Rust? In a word, no. To understand my reluctance, take some of my most recent experience: this blog entry was delayed because I needed to debug (and fix) a nasty problem with our implementation of the Linux ABI. As it turns out, Linux and SmartOS make slightly different guarantees with respect to the interaction of vfork and signals, and our code was fatally failing on a condition that should be impossible. Any old Unix hand (or quick study!) will tell you that vfork and signal disposition are each semantic superfund sites in their own right — and that their horrific (and ill-defined) confluence can only be unimaginably toxic. But the real problem is that actual software implicitly depends on these semantics — and any operating system that is going to want to run existing software will itself have to mimic them. You don’t want to write this code, because no one wants to write this code.

Now, one option (which I honor!) is to rewrite the OS from scratch, as if legacy applications essentially didn’t exist. While there is a tremendous amount of good that can come out of this (and it can find many use cases), it’s not a fit for me personally.

So while I may not want to rewrite the OS kernel in Rust, I do think that Rust is an excellent fit for much of the broader system. For example, at the recent OpenZFS Developers Summit, Matt Ahrens and I were noodling the notion of user-level components for ZFS in Rust. Specifically: zdb is badly in need of a rewrite — and Rust would make an excellent candidate for it. There are many such examples spread throughout ZFS and the broader the system, including a few in kernel. Might we want to have a device driver model that allows for Rust drivers? Maybe! (And certainly, it’s technically possible.) In any case, you can count on a lot more Rust from me and into the indefinite future — whether in the OS, near the OS, or above the OS.

Taking your own plunge

I wrote all of this up in part to not only explain why I took the plunge, but to encourage others to take their own. If you were as I was and are contemplating diving into Rust, a couple of pieces of advice, for whatever they’re worth:

  • I would recommend getting both The Rust Programming Language and Programming Rust. They are each excellent in their own right, and different enough to merit owning both. I also found it very valuable to have two different sources on subjects that were particularly thorny.
  • Understand ownership before you start to write code. The more you understand ownership in the abstract, the less you’ll have to learn at the merciless hands of compiler error messages.
  • Get in the habit of running rustc on short programs. Cargo is terrific, but I personally have found it very valuable to write short Rust programs to understand a particular idea — especially when you want to understand optional or new features of the compiler. (Roll on, non-lexical lifetimes!)
  • Be careful about porting something to Rust as a first project — or otherwise implementing something you’ve implemented before. Now, obviously, this is exactly what I did, and it can certainly be incredibly valuable to be able to compare an implementation in Rust to an implementation in another language — but it can also cut against you: the fact that I had implemented statemaps in C sent me down some paths that were right for C but wrong for Rust; I made much better progress when I rethought the implementation of my problem the way Rust wanted me to think about it.
  • Check out the New Rustacean podcast by Chris Krycho. I have really enjoyed Chris’s podcasts, and have been working my way through them when commuting or doing household chores. I particularly enjoyed his interview with Sean Griffen and his interview with Carol Nichols.
  • Check out rustlings. I learned about this a little too late for me; I wish I had known about it earlier! I did work through the Rust koans, which I enjoyed and would recommend for the first few hours with Rust.

I’m sure that there’s a bunch of stuff that I missed; if there’s a particular resource that you found useful when learning Rust, message me or leave a comment here and I’ll add it.

Let me close by offering a sincere thanks to those in the Rust community who have been working so long to develop such a terrific piece of software — and especially those who have worked so patiently to explain their work to us newcomers. You should be proud of what you’ve accomplished, both in terms of a revolutionary technology and a welcoming community — thank you for inspiring so many of us about what infrastructure software can become, and I look forward to many years of implementing in Rust!

Recent Posts

September 2, 2024
November 18, 2023
November 27, 2022
October 11, 2020
July 31, 2019
December 16, 2018
September 18, 2018
December 21, 2016
September 30, 2016
September 26, 2016
September 13, 2016
July 29, 2016
December 17, 2015
September 16, 2015
January 6, 2015
November 10, 2013
September 3, 2013
June 7, 2012
September 15, 2011
August 15, 2011
March 9, 2011
September 24, 2010
August 11, 2010
July 30, 2010
July 25, 2010
March 10, 2010
November 26, 2009
February 19, 2009
February 2, 2009
November 10, 2008
November 3, 2008
September 3, 2008
July 18, 2008
June 30, 2008
May 31, 2008
March 16, 2008
December 18, 2007
December 5, 2007
November 11, 2007
November 8, 2007
September 6, 2007
August 21, 2007
August 2, 2007
July 11, 2007
May 20, 2007
March 19, 2007
October 12, 2006
August 17, 2006
August 7, 2006
May 1, 2006
December 13, 2005
November 16, 2005
September 13, 2005
September 9, 2005
August 21, 2005

Archives

Archives