REVIEW - Systems Performance 2e - Enterprise and the Cloud

Title:

Systems Performance 2e - Enterprise and the Cloud

Author:

Brendan Gregg

ISBN:

9780136820154

Publisher:

Pearson (2020)

Pages:

928

Reviewer:

Paul Floyd

Reviewed:

July 2021

Rating:

★★★★☆

Highly recommended

Brendan Gregg has now authored or co-authored five books covering systems performance, starting with Solaris and DTrace, then the two editions of this book and BPF (which I reviewed here https://accu.org/bookreviews/2021/floyd_1962/). Gregg worked at Sun/Oracle on Solaris performance, then, with many other ex-Sun employees, joined Joyent, a cloud server company using a Solaris-derived OS. After Joyent, he went to work for Netflix, switching to Linux performance work in the process. As you can tell, that’s a lot of performance experience, in particular related to tools (he developed the Dtrace toolkit), filesystems (ZFS) and networking. All that to say that this book was written by a leading expert in the field.

The book is all about measuring systems performance. It is targeted at the Linux platform (the 1st edition also had significant Solaris coverage which has mostly been removed). If you are interested in systems performance then be clear that this covers only Linux. Nothing here will help you with Windows, and only a fairly small amount will be of use on macOS and other unix-like systems.

The chapters in the book cover the four sections (not explicitly described as such in the book).

  1. Introduction (chapters 1 and 2)
  2. Explaining how hardware and Operating Systems work and measurement tools that exist (chapters 3 and 4)
  3. How can we observe (or measure) the workings of the hardware and software? (One chapter for each component that can be measured e.g., Memory, disk).
  4. Benchmarking and deeper details on the measurement tools.

I found the first hundred pages or so of the book a bit slow going. There was quite a lot of basic introduction. I would imagine that most people reading this book will already be familiar with these details. It’s not all for novices though. Chapter 2, ‘Methodologies’, sets the tone for much of the specific coverage of performance throughout the test of the book. A little niggle here: there’s a short section on queuing theory that I thought was a bit outside the scope of the book and perhaps some references here would have been better for those wishing to read up on the theory.

For me, the good stuff started with chapters 5 (Applications), 6 (CPUs) and 7 (Memory). These are the sorts of things that I’m often looking at in my day job so no surprise it’s where I put the most Post-Its for future reference. A slight frustration here is that there is a lot of work being done in this field so features are getting added quite quickly. For instance, the option to directly produce flame graphs rather than to extract them from ‘perf’ results using scripts. This is fine as long as you use the latest and greatest kernel. Which is not my case at work – we are often 5–10 years behind the leading edge kernels.

The next few chapters in the specific observations part of the book (on File Systems, Disks, Networks and Cloud Computing) were all interesting as they are topics that I’ve only touched on briefly in my career. In particular, there are good explanations about the distinction between File System and Disk and there was a lot of information about Cloud Computing that I learned.

Chapter 13 (Benchmarking) is a bit less hands on. This chapter describes how benchmarks get run, how they ought to be run and how you can use performance tools to check the data provided by the benchmarks.

The last few chapters cover specifics of the measurement tools already covered in the book: perf, Ftrace, and BPF. The Ftrace chapter did feel a bit ‘by kernel developers for kernel developers’, and whilst I didn’t feel that it’s particularly relevant to application developers, it’s good to know that it is there.

The last chapter is a short case study analysing a real problem.

The book is well structured, managing to both contain a lot of background reference material and a significant amount of practical cookbook-style recipes. When I read books like this, I always wonder if they will stand the test of time. Looking back a bit at Gregg’s previous works, clearly the first one covering Solaris is of little use these days. DTrace is still going strong on macOS and the BSDs and I still refer to that book occasionally (and as a side note, I would like to see updates to Gregg’s DTrace work since I use both FreeBSD and macOS at home, but I suppose that he doesn’t work on the FreeBSD bit of Netflix and so has now moved on to Linux only). Performance tools have changed quite a bit on Linux over the years (SystemTap came and has more or less gone and perf and BPF seem to be the generic and programmable performance measurement tools of choice. The details may change, but I think that this book will still be useful in 5–10 years time.