X-Git-Url: http://git.annexia.org/?a=blobdiff_plain;f=2016-eng-talk%2Ftalk.txt;h=769693d62f574f978ea1056649758ebcd087afdb;hb=eb2549666cdf6a84c26667bbf9daa30fe19dcc5b;hp=0df4ff88e23761acb6ac1570398839f011c224a6;hpb=c19d7100832dae3be2fe0722e25ee862f325502d;p=libguestfs-talks.git diff --git a/2016-eng-talk/talk.txt b/2016-eng-talk/talk.txt index 0df4ff8..769693d 100644 --- a/2016-eng-talk/talk.txt +++ b/2016-eng-talk/talk.txt @@ -7,6 +7,17 @@ virtual machines. Today I'm going to talk about how long it takes to boot up and shut down virtual machines. + SLIDE: Intel talk + +Originally I had planned to give this talk at the KVM Forum in August, +but Intel submitted a similar talk, and as I'll discuss in a minute, +they have had a large team working on this for over two years so they +are much further ahead. So this is the details of the Intel talk and +you should definitely go and see that if you're at KVM Forum in +August, or as an online video afterwards. + + - - - - + It's "common knowledge" (in quotes) that full virtualization is slow and heavyweight, whereas containers (which are just a chroot, really) are fast and lightweight. @@ -16,7 +27,7 @@ are fast and lightweight. Here is a slide which I clipped from a random presentation on the internet. The times are nonsense and not backed up by any evidence in the rest of the presentation, but it's typical of the sort of -information circulating. +incorrect information circulating. SLIDE: Clear containers logo @@ -53,7 +64,7 @@ kernel? It should be possible to bring boot times down to around 500-600ms without any gross hacks and without an excessive amount of effort. -The first step to curing the problem is measuring it. +The first step to curing a problem is measuring the problem. SLIDE: qemu -kernel boot @@ -73,10 +84,6 @@ sequence, and then write a couple of programs to analyze these timestamped messages using string matching and regular expressions, to produce boot benchmarks and charts. - SLIDE: boot-benchmark - -The boot-benchmark tool produces simple timings averaged over 10 runs. - SLIDE: boot-analysis 1 The more complex boot-analysis tool produces boot charts showing @@ -86,25 +93,37 @@ each stage of booting ... ... and which activities took the longest time. -The tools are based on the libguestfs framework and are surprisingly -accessible. With a recent Linux distro anyone should be able to run -them. You can download these screenshots and get links to the tools -in the PDF paper which accompanies this talk. + SLIDE: boot-benchmark + +The boot-benchmark tool produces simple timings averaged over 10 runs. + +The tools are based on the libguestfs framework and are quite easy to +run. With a recent Linux distro anyone should be able to run them. +You can download these screenshots and get links to the tools in the +PDF paper which accompanies this talk. SLIDE: progress graph Here's an interesting graph of my progress on this problem over time, versus the appliance boot time up the left in milliseconds. I spent the first couple of weeks in March exploring different ways to trace -QEMU, and finally writing those tools. Once I had to tools giving me +QEMU, and finally writing those tools. Once I had tools giving me visibility into what was really going on, I got the time down from 3500ms down to 1200ms in the space of a few days. -It's worth noting that the libguestfs appliance had been booting in 3 -to 4 seconds for literally half a decade before this. +It's worth noting that the libguestfs appliance had been taking 3 or 4 +seconds to start for literally half a decade before this. But there's also a long tail with diminishing returns. +It would be tempting at this point for me to describe every single +problem and fix. But that would be boring and if you're really +interested in that, it's all described in the paper that accompanies +this talk. Instead, I wanted to try to classify the delays according +to their root cause. So we should be able to see, I hope, that some +problems are easily fixed, whereas others are "institutionally" or +"organizationally" very difficult to deal with. + Let's have a look at a few delays which are avoidable. I like to classify these according to their root cause. @@ -143,10 +162,10 @@ happening in future. The largest single saving was realizing that we shouldn't print out all the kernel debug messages to the serial console unless we're -operating in debug mode. In non-debug mode, the messages are just -thrown away. The solution was to add a statement saying "if we're not -in verbose mode, add the quiet option to the kernel command line", and -that saved 1000 milliseconds. +operating in debug mode. In non-debug mode, the messages fed to the +serial port are thrown away. The solution was to add a statement +saying "if we're not in debug mode, add the quiet option to the kernel +command line", and that saved 1000 milliseconds. SLIDE: stupidity @@ -216,8 +235,7 @@ external shell scripts that udev runs, or who knows. SLIDE: Conclusions - - + SLIDE: Conclusions 2 And finally a plea from the heart ...