Today I'm going to talk about how long it takes to boot up and shut
down virtual machines.
+ SLIDE: Intel talk
+
+Originally I had planned to give this talk at the KVM Forum in August,
+but Intel submitted a similar talk, and as I'll discuss in a minute,
+they have had a large team working on this for over two years so they
+are much further ahead. So this is the details of the Intel talk and
+you should definitely go and see that if you're at KVM Forum in
+August, or as an online video afterwards.
+
+ - - - -
+
It's "common knowledge" (in quotes) that full virtualization is slow
and heavyweight, whereas containers (which are just a chroot, really)
are fast and lightweight.
Here is a slide which I clipped from a random presentation on the
internet. The times are nonsense and not backed up by any evidence in
the rest of the presentation, but it's typical of the sort of
-information circulating.
+incorrect information circulating.
SLIDE: Clear containers logo
It should be possible to bring boot times down to around 500-600ms
without any gross hacks and without an excessive amount of effort.
-The first step to curing the problem is measuring it.
+The first step to curing a problem is measuring the problem.
SLIDE: qemu -kernel boot
timestamped messages using string matching and regular expressions, to
produce boot benchmarks and charts.
- SLIDE: boot-benchmark
-
-The boot-benchmark tool produces simple timings averaged over 10 runs.
-
SLIDE: boot-analysis 1
The more complex boot-analysis tool produces boot charts showing
... and which activities took the longest time.
-The tools are based on the libguestfs framework and are surprisingly
-accessible. With a recent Linux distro anyone should be able to run
-them. You can download these screenshots and get links to the tools
-in the PDF paper which accompanies this talk.
+ SLIDE: boot-benchmark
+
+The boot-benchmark tool produces simple timings averaged over 10 runs.
+
+The tools are based on the libguestfs framework and are quite easy to
+run. With a recent Linux distro anyone should be able to run them.
+You can download these screenshots and get links to the tools in the
+PDF paper which accompanies this talk.
SLIDE: progress graph
Here's an interesting graph of my progress on this problem over time,
versus the appliance boot time up the left in milliseconds. I spent
the first couple of weeks in March exploring different ways to trace
-QEMU, and finally writing those tools. Once I had to tools giving me
+QEMU, and finally writing those tools. Once I had tools giving me
visibility into what was really going on, I got the time down from
3500ms down to 1200ms in the space of a few days.
-It's worth noting that the libguestfs appliance had been booting in 3
-to 4 seconds for literally half a decade before this.
+It's worth noting that the libguestfs appliance had been taking 3 or 4
+seconds to start for literally half a decade before this.
But there's also a long tail with diminishing returns.
+It would be tempting at this point for me to describe every single
+problem and fix. But that would be boring and if you're really
+interested in that, it's all described in the paper that accompanies
+this talk. Instead, I wanted to try to classify the delays according
+to their root cause. So we should be able to see, I hope, that some
+problems are easily fixed, whereas others are "institutionally" or
+"organizationally" very difficult to deal with.
+
Let's have a look at a few delays which are avoidable. I like to
classify these according to their root cause.
The largest single saving was realizing that we shouldn't print out
all the kernel debug messages to the serial console unless we're
-operating in debug mode. In non-debug mode, the messages are just
-thrown away. The solution was to add a statement saying "if we're not
-in verbose mode, add the quiet option to the kernel command line", and
-that saved 1000 milliseconds.
+operating in debug mode. In non-debug mode, the messages fed to the
+serial port are thrown away. The solution was to add a statement
+saying "if we're not in debug mode, add the quiet option to the kernel
+command line", and that saved 1000 milliseconds.
SLIDE: stupidity
SLIDE: Conclusions
-
-
+ SLIDE: Conclusions 2
And finally a plea from the heart ...