X-Git-Url: http://git.annexia.org/?a=blobdiff_plain;f=2021-pipelines%2Fnotes.txt;h=17a2a35e1376f5d88b031f16425324cdf8ae7468;hb=HEAD;hp=a4038e876ea5f559830f7f4cbe3f455b0acb6bd2;hpb=3c7194838c82b65c37e4035cee6f29c2efaebd32;p=libguestfs-talks.git diff --git a/2021-pipelines/notes.txt b/2021-pipelines/notes.txt index a4038e8..17a2a35 100644 --- a/2021-pipelines/notes.txt +++ b/2021-pipelines/notes.txt @@ -5,17 +5,26 @@ February 15th 2021 Introduction ---------------------------------------------------------------------- -Today I'm going to talk about a topic which seems very obvious to me. -But I work in the business of moving disk images between systems, and -it may not be obvious to other people. +Virt-v2v is a project for "lifting and shifting" workloads from +proprietary VMware systems to open source management platforms like +RHV/oVirt, Open Stack and CNV/KubeVirt. To do this we have to copy +vast amounts of data quickly, modifying it in flight. -INTRO INTRO INTRO -INVOLVE PIPELINES DIAGRAM +Nearly everything we have to copy is a virtual machine disk image of +some kind, and there are particular techniques you can use to copy +these very efficiently: -The bad: - - creating unbounded temporary files - - slow copies + - without copying zeroes or deleted data + - without making temporary copies + + - modifying the contents in flight + + - without touching the originals + +To those working in the virtualization space, all the techniques I'm +going to describe will be quite well-known and obvious. But I see +other projects making the same mistakes over and over. Simple copying @@ -47,7 +56,7 @@ COMMAND: qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 -display none \ - -drive file=ssh://kool/mnt/scratch/fedora-33.img,format=raw,if=virtio \ + -drive file=ssh://kool/mnt/scratch/pipes/fedora-33.img,format=raw,if=virtio \ -serial stdio @@ -59,15 +68,15 @@ DIAGRAM: ssh file -------> snapshot ------> qemu -That command opens the remote file for writes. If we want to prevent -modifications to the remote file, we can place a snapshot into the -path. A snapshot in this case is a qcow2 file with the backing file -set to the SSH URL. Any modifications we make are saved into the -snapshot. +The command I just showed you opened the remote file for writes. If +we want to prevent modifications to the remote file, we can place a +snapshot into the path. A snapshot in this case is a qcow2 file with +the backing file set to the SSH URL. Any modifications we make are +saved into the snapshot. The original disk is untouched. COMMAND: - qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/fedora-33.img snapshot.qcow2 + qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2 qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 -display none \ -drive file=snapshot.qcow2,format=qcow2,if=virtio \ @@ -81,11 +90,12 @@ Instead of booting the disk, let's make a full local copy: COMMAND: - qemu-img convert -f qcow2 snapshot.qcow2 -O raw disk.img -p + qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2 + qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p -Sparsification +Disk images ---------------------------------------------------------------------- DIAGRAM: @@ -96,9 +106,8 @@ Now let's take side-step to talk about what's inside disk images. Firstly disk images are often HUGE. A BluRay movie is 50 gigabytes, but that's really nothing compared to the larger disk images that we -move about when we do "lift and shift" of workloads from foreign -hypervisors to KVM. Those can be hundreds of gigabytes or terabytes, -and we move hundreds of them in a single batch. +move about when we move to KVM. Those can be hundreds of gigabytes or +terabytes, and we move hundreds of them in a single batch. But the good news is that these disk images are often quite sparse. They may contain much less actual data than the virtual size. @@ -109,10 +118,298 @@ long time accumulate lots of deleted files and other stuff that isn't needed by the operating system but also isn't zeroes. -We can cope with both of these things. The technique -is called "sparsification". +Disk metadata +---------------------------------------------------------------------- + +DIAGRAM: + + [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ] + < allocated > < allocated > < hole > + < hole > + +What a lot of people don't know about disk images is there's another +part to them - the metadata. This records which parts of the disk +image are allocated, and while parts are "holes". + +Because less-experienced system administrators don't know about this, +the metadata often gets lost when files are copied around. + +For long-running virtual machines, deleted data may often still be +allocated (although this depends on how the VM is set up). + +Some tools you can use to study the metadata of files: + + ls -lsh + filefrag + qemu-img map + nbdinfo --map + + +Sparsification +---------------------------------------------------------------------- + +DIAGRAM: + + [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ] + < allocated > < allocated > < hole > + < hole > + + | + v + + [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ] + < allocated > < allocated >< hole > + < hole > + + +We can cope with both of these things. The technique is called +"sparsification". Some tools you can use to sparsify a disk are: + + fstrim + virt-sparsify --in-place + +Sparsification part 2 +---------------------------------------------------------------------- + + ssh + file -------> snapshot <------ virt-sparsify + ------> qemu-img convert + ^ + | + zero clusters are saved in here + +I'm going to take the same scenario as before, but use +sparsification before doing the copy. + +(Run these commands and show the output and ls of the snapshot) - qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/fedora-33.img snapshot.qcow2 + + +Benchmark A +---------------------------------------------------------------------- + +Now you might think this is all a bit obscure, but is it any good? +In this first benchmark, I've compared copying a disk in several +different ways to see which is fastest. All of the copying happens +between two idle machines, over a slow network. + +The full methodology is in the background notes that accompany this +talk, which I'll link at the end. + + scp scp remote:fedora-33.img local.img + + ssh + sparsify file -> qcow2 snapshot <- virt-sparsify + -> qemu-img convert + + without (as above but without sparsifying) + sparsify + + ssh + nbdcopy file -> nbdkit cow filter <- virt-sparsify + -> nbdcopy + +Which do you think will be faster? + + +Benchmark A results +---------------------------------------------------------------------- + +(Same slides with timings added) + +Lock contention in the cow filter is thought to be the +reason for the poor performance of nbdkit + nbdcopy. + + +Opening OVA files +---------------------------------------------------------------------- + +DIAGRAM: + + guest.ova -----> tar-filter <- virt-sparsify + -> qemu-img convert + + guest.ova------------+ + | guest.ovf | + | disk1.raw|vmdk | + +--------------------+ + + tar file = header - file - header - file - ... + +This technique isn't just useful for remote files. Another trick we +use in virt-v2v is using an nbdkit filter to unpack VMware's OVA files +without any copies. OVA files are really uncompressed tar files. The +disk inside can be in a variety of formats, often raw or VMDK. + +We can ask the 'tar' command to give us the offset and size of the +disk image within the file and simply read it out of the file +directly. + + +Benchmark B +---------------------------------------------------------------------- + + cp test.ova test2.ova + + tar xf test.ova fedora-33.img + + + nbdkit -> tar filter <- sparsify + -> qemu-img convert + + nbdkit -f --exit-with-parent --filter=tar file test.ova tar-entry=fedora-33.img + qemu-img create -f qcow2 -b nbd:localhost:10809 snapshot.qcow2 virt-sparsify --inplace snapshot.qcow2 - ls -l snapshot.qcow2 + qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img + +Which is faster? + +Benchmark B results +---------------------------------------------------------------------- + +(Same as above, with results) + +The results are interesting, but if you remember what we said about +the disk format and sparsification then it shouldn't be surprising. + +The copy and tar commands have to churn through the entire +disk image - zeroes and deleted files. + +With nbdkit, sparsification and qemu-img convert we only copy a +fraction of the data. + +Note the two methods do NOT produce bit-for-bit equivalent outputs. +Q: Is this a problem? +A: No different from if the owner of the VM had run "fstrim". + + +Modifications +---------------------------------------------------------------------- + +Virt-v2v doesn't only make efficient copies, it also modifies the disk +image in flight. Some kinds of modifications that are made: + + - installing virtio drivers + + - removing VMware tools + + - modifying the bootloader + + - rebuilding initramfs + + - changing device names in /etc files + + - changing the Windows registry + + - (and much more) + +These are significant modifications, and they happen entirely during +the transfer, without touching the source and without making large +temporary copies. + +I'm not going to talk about this in great detail because it's a very +complex topic. Instead I will show you a simple demonstration of a +similar technique. + +DIAGRAM: + + (Screenshot from https://alt.fedoraproject.org/cloud/) + + HTTPS + -----> nbdkit-curl-plugin --> xz filter --> qcow2 snapshot + <-- sparsify + <-- deactivate cloud-init + <-- write a file + --> qemu-img convert + +DEMO: + + nbdkit curl https://download.fedoraproject.org/pub/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.raw.xz --filter=xz + qemu-img create -f qcow2 -b nbd://localhost -F raw snapshot.qcow2 + virt-sparsify --inplace snapshot.qcow2 + virt-customize -a snapshot.qcow2 \ + --run-command 'systemctl disable cloud-init' \ + --write /hello:HELLO + ls -lsh snapshot.qcow2 + qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p + guestfish --ro -a local.img -i ll / + + +Complete virt-v2v pipelines +---------------------------------------------------------------------- + +DIAGRAM: + + proprietary + transport + VMware -----> nbdkit ----> nbdkit ----> qcow2 + ESXi vddk rate snapshot + plugin filter + + qcow2 <---- sparsify + snapshot <---- install drivers + -----> qemu-img convert + + nbd HTTPS + qemu-img convert ----> nbdkit -----> imageio + python + plugin + +Discuss: + + - separate input and output sides + + - NBD used extensively + + - very efficient and no large temporary copies + + - virt-v2v may be on a separate machine + + - rate filter + + - many other tricks used + + + +Conclusions +---------------------------------------------------------------------- + +Disk image pipelines: + + - efficient + + - flexible + + - avoid local copies + + - avoid copying zeroes/sparseness/deleted data + + - sparsification + + - modifications in flight + + +Future work / other topics +---------------------------------------------------------------------- + +nbdcopy vs qemu-img convert + +copy-on-read, bounded caches + +block size adjustment + +reading from containers + +stop using gzip! + + +References +---------------------------------------------------------------------- + +http://git.annexia.org/?p=libguestfs-talks.git;a=tree;f=2021-pipelines + +https://gitlab.com/nbdkit + +https://libguestfs.org +https://libguestfs.org/virt-v2v.1.html