From: Richard W.M. Jones Date: Tue, 31 May 2016 18:17:51 +0000 (+0100) Subject: Initial run of slides. X-Git-Url: http://git.annexia.org/?a=commitdiff_plain;h=c19d7100832dae3be2fe0722e25ee862f325502d;hp=21f411f1dbc236a38dffe62b053d8eec80256d48;p=libguestfs-talks.git Initial run of slides. --- diff --git a/2016-eng-talk/.gitignore b/2016-eng-talk/.gitignore new file mode 100644 index 0000000..09b1a98 --- /dev/null +++ b/2016-eng-talk/.gitignore @@ -0,0 +1,9 @@ +/.~lock.*# +/paper.aux +/paper.dvi +/paper.fdb_latexmk +/paper.fls +/paper.log +/paper.out +/paper.pdf +/progress.pdf diff --git a/2016-kvm-forum/Makefile b/2016-eng-talk/Makefile similarity index 100% rename from 2016-kvm-forum/Makefile rename to 2016-eng-talk/Makefile diff --git a/2016-kvm-forum/NOTES.txt b/2016-eng-talk/NOTES.txt similarity index 100% rename from 2016-kvm-forum/NOTES.txt rename to 2016-eng-talk/NOTES.txt diff --git a/2016-eng-talk/README.txt b/2016-eng-talk/README.txt new file mode 100644 index 0000000..b2e99e6 --- /dev/null +++ b/2016-eng-talk/README.txt @@ -0,0 +1 @@ +This is a talk I gave to a Red Hat Engineering team on 6th June 2016. diff --git a/2016-kvm-forum/analysis-run.txt b/2016-eng-talk/analysis-run.txt similarity index 100% rename from 2016-kvm-forum/analysis-run.txt rename to 2016-eng-talk/analysis-run.txt diff --git a/2016-kvm-forum/boot-analysis-screenshot-2.png b/2016-eng-talk/boot-analysis-screenshot-2.png similarity index 100% rename from 2016-kvm-forum/boot-analysis-screenshot-2.png rename to 2016-eng-talk/boot-analysis-screenshot-2.png diff --git a/2016-kvm-forum/boot-analysis-screenshot-2.xcf b/2016-eng-talk/boot-analysis-screenshot-2.xcf similarity index 100% rename from 2016-kvm-forum/boot-analysis-screenshot-2.xcf rename to 2016-eng-talk/boot-analysis-screenshot-2.xcf diff --git a/2016-kvm-forum/boot-analysis-screenshot.png b/2016-eng-talk/boot-analysis-screenshot.png similarity index 100% rename from 2016-kvm-forum/boot-analysis-screenshot.png rename to 2016-eng-talk/boot-analysis-screenshot.png diff --git a/2016-kvm-forum/boot-analysis-screenshot.xcf b/2016-eng-talk/boot-analysis-screenshot.xcf similarity index 100% rename from 2016-kvm-forum/boot-analysis-screenshot.xcf rename to 2016-eng-talk/boot-analysis-screenshot.xcf diff --git a/2016-kvm-forum/kernel-config-minimal b/2016-eng-talk/kernel-config-minimal similarity index 100% rename from 2016-kvm-forum/kernel-config-minimal rename to 2016-eng-talk/kernel-config-minimal diff --git a/2016-kvm-forum/paper.tex b/2016-eng-talk/paper.tex similarity index 100% rename from 2016-kvm-forum/paper.tex rename to 2016-eng-talk/paper.tex diff --git a/2016-kvm-forum/progress.data b/2016-eng-talk/progress.data similarity index 100% rename from 2016-kvm-forum/progress.data rename to 2016-eng-talk/progress.data diff --git a/2016-kvm-forum/progress.plot b/2016-eng-talk/progress.plot similarity index 100% rename from 2016-kvm-forum/progress.plot rename to 2016-eng-talk/progress.plot diff --git a/2016-eng-talk/progress.png b/2016-eng-talk/progress.png new file mode 100644 index 0000000..afc06a7 Binary files /dev/null and b/2016-eng-talk/progress.png differ diff --git a/2016-eng-talk/slides.odp b/2016-eng-talk/slides.odp new file mode 100644 index 0000000..0fb738d Binary files /dev/null and b/2016-eng-talk/slides.odp differ diff --git a/2016-eng-talk/talk.txt b/2016-eng-talk/talk.txt new file mode 100644 index 0000000..0df4ff8 --- /dev/null +++ b/2016-eng-talk/talk.txt @@ -0,0 +1,225 @@ +Good afternoon everyone. + +My name is Richard Jones and I work for Red Hat on a suite of tools we +call the "virt tools" and libguestfs for manipulating disk images and +virtual machines. + +Today I'm going to talk about how long it takes to boot up and shut +down virtual machines. + +It's "common knowledge" (in quotes) that full virtualization is slow +and heavyweight, whereas containers (which are just a chroot, really) +are fast and lightweight. + + SLIDE: Start and stop times + +Here is a slide which I clipped from a random presentation on the +internet. The times are nonsense and not backed up by any evidence in +the rest of the presentation, but it's typical of the sort of +information circulating. + + SLIDE: Clear containers logo + +This all changed when Intel announced their project called "Clear +Containers". They had a downloadable demonstration which showed a +full VM booting to a login prompt in 150ms, and using about 20MB of +memory. + +Today I'll try to persuade you that: + + SLIDE: Performance brings security + +performance on a par with Intel Clear Containers brings security, and +new opportunities (new places) where we can use full virtualization. +If we can wrap Docker containers in VMs, we can make them secure. But +we can only do that if the overhead is low enough that we don't lose +the density and performance advantages of containers. And there are +new areas where high performance full virtualization makes sense, +particularly sandboxing individual applications, and desktop +environments similar to Qubes. + +Intel's Clear Containers demo is an existence proof that it can be +done. + +However ... there are shortcomings at the moment. The original demo +used kvmtool not QEMU. And it used a heavily customized Linux kernel. +Can we do the same thing with QEMU and a stock Linux distribution +kernel? + + SLIDE: No + + SLIDE: No, but we can do quite well + +It should be possible to bring boot times down to around 500-600ms +without any gross hacks and without an excessive amount of effort. + +The first step to curing the problem is measuring it. + + SLIDE: qemu -kernel boot + +This is how a Linux appliance boots. I should say at this point I'm +only considering ordinary QEMU with the "-kernel" option, and only on +x86-64 hardware. + +Some of the things on this slide you might not think of as part of the +boot process, but they are all overhead as far as wrapping a container +in a VM is concerned. The important thing is that not everything here +is "inside" QEMU or the guest, and so QEMU-based or guest-based +tracing tools do not help. + +What I did eventually was to connect a serial port to QEMU and +timestamp all the messages printed by various stages of the boot +sequence, and then write a couple of programs to analyze these +timestamped messages using string matching and regular expressions, to +produce boot benchmarks and charts. + + SLIDE: boot-benchmark + +The boot-benchmark tool produces simple timings averaged over 10 runs. + + SLIDE: boot-analysis 1 + +The more complex boot-analysis tool produces boot charts showing +each stage of booting ... + + SLIDE: boot-analysis 2 + +... and which activities took the longest time. + +The tools are based on the libguestfs framework and are surprisingly +accessible. With a recent Linux distro anyone should be able to run +them. You can download these screenshots and get links to the tools +in the PDF paper which accompanies this talk. + + SLIDE: progress graph + +Here's an interesting graph of my progress on this problem over time, +versus the appliance boot time up the left in milliseconds. I spent +the first couple of weeks in March exploring different ways to trace +QEMU, and finally writing those tools. Once I had to tools giving me +visibility into what was really going on, I got the time down from +3500ms down to 1200ms in the space of a few days. + +It's worth noting that the libguestfs appliance had been booting in 3 +to 4 seconds for literally half a decade before this. + +But there's also a long tail with diminishing returns. + +Let's have a look at a few delays which are avoidable. I like to +classify these according to their root cause. + + SLIDE: 16550A UART probing + +When the Linux kernel boots it spends 25ms probing the serial port to +check it's really a working 16550A. Hypervisors never export broken +serial ports to their guests, so this is useless. The kernel +maintainers' argument is that you might passthrough a real serial port +to a guest, so just checking that the kernel is running under KVM +isn't sufficient to bypass this test. I haven't managed to get a good +solution to this, but it'll probably involve some kind of ACPI +description to say "yes really this is a working UART from KVM". + + SLIDE: misaligned goals + +My root cause for that problem is, I think, misaligned goals. The +goals of booting appliances very fast don't match the correctness +goals of the kernel. + + SLIDE: hwclock + +We ran hwclock in the init script. The tools told us this was taking +300ms. It's not even necessary since we always have kvmclock +available in the appliance. + + SLIDE: carelessness + +I'm going to put that one down to carelessness. Because I didn't have +the tools before to analyze the boot process, commands crept into the +init script which looked innocent but actually took a huge amount of +time to run. With better tools we should be able to avoid this +happening in future. + + SLIDE: kernel debug messages + +The largest single saving was realizing that we shouldn't print out +all the kernel debug messages to the serial console unless we're +operating in debug mode. In non-debug mode, the messages are just +thrown away. The solution was to add a statement saying "if we're not +in verbose mode, add the quiet option to the kernel command line", and +that saved 1000 milliseconds. + + SLIDE: stupidity + +Let's now look at examples of problems that cannot be solved +easily. + + SLIDE: ELF loader + +QEMU in Fedora is linked to 170 libraries, and just doing "qemu +-version" on my laptop takes about 60ms. We can't reduce the number +of libraries very easily, especially when we want to probe the +features of QEMU which necessarily means loading all the libraries. +And we can't simplify the ELF loader either because it is operating to +a complex standard with lots of obscure features like "symbol +interposition". + + SLIDE: standards + +Standards means there are some things you just have to work around. +In the case of qemu probing, by aggressive caching. + + SLIDE: initcalls + +The kernel performs a large number of "initcalls" -- initializer +functions. It does them serially on a single processor. And every +subsystem which is enabled seems to have an initcall. Individual +initcalls are often very short, but the sheer number of them is a +problem. + +One solution would be parallelization, but this has been rejected +upstream. + +Another would be to use a custom kernel build which chops out +any subsystem that doesn't apply to our special case virtual +machine. In fact I tried this approach and it's possible to +reduce the time spent in initcalls before reaching userspace +by 60%. + + SLIDE: maintainability + +BUT for distro kernels we usually want a single kernel image that can +be used everywhere. Baremetal, virtual machines. That's because +building and maintaining multiple kernel images, fixing bugs, tracking +CVEs, and so on is too difficult with multiple images. Experience +with Xen and on ARM taught us that. + +So Intel's solution which was to build a custom cut-down kernel is not +one that most distros will be able to use. + +I have more numbers to support this in the accompanying paper. + +Last one ... + + SLIDE: udev + +At the moment, after the kernel initcalls, udev is the slowest +component. udev takes 130ms to populate /dev. + + SLIDE: ? + +The cause here may be the inflexible and indivisible configuration of +udev. We cannot and don't want to maintain our own simplified +configuration and it's impossible to split up the existing config. +But maybe that's not the problem. It might be the kernel sending udev +events serially, or udev processing them slowly, or the number of +external shell scripts that udev runs, or who knows. + + SLIDE: Conclusions + + + + +And finally a plea from the heart ... + + SLIDE: Please think + diff --git a/2016-kvm-forum/.gitignore b/2016-kvm-forum/.gitignore deleted file mode 100644 index 28ae85e..0000000 --- a/2016-kvm-forum/.gitignore +++ /dev/null @@ -1,8 +0,0 @@ -paper.aux -paper.dvi -paper.fdb_latexmk -paper.fls -paper.log -paper.out -paper.pdf -progress.pdf