X-Git-Url: http://git.annexia.org/?a=blobdiff_plain;f=2021-pipelines%2Fnotes.txt;h=17a2a35e1376f5d88b031f16425324cdf8ae7468;hb=HEAD;hp=a4038e876ea5f559830f7f4cbe3f455b0acb6bd2;hpb=3c7194838c82b65c37e4035cee6f29c2efaebd32;p=libguestfs-talks.git

diff --git a/2021-pipelines/notes.txt b/2021-pipelines/notes.txt
index a4038e8..17a2a35 100644
--- a/2021-pipelines/notes.txt
+++ b/2021-pipelines/notes.txt
@@ -5,17 +5,26 @@ February 15th 2021
 Introduction
 ----------------------------------------------------------------------
 
-Today I'm going to talk about a topic which seems very obvious to me.
-But I work in the business of moving disk images between systems, and
-it may not be obvious to other people.
+Virt-v2v is a project for "lifting and shifting" workloads from
+proprietary VMware systems to open source management platforms like
+RHV/oVirt, Open Stack and CNV/KubeVirt.  To do this we have to copy
+vast amounts of data quickly, modifying it in flight.
 
-INTRO INTRO INTRO
-INVOLVE PIPELINES DIAGRAM
+Nearly everything we have to copy is a virtual machine disk image of
+some kind, and there are particular techniques you can use to copy
+these very efficiently:
 
-The bad:
- - creating unbounded temporary files
- - slow copies
+ - without copying zeroes or deleted data
 
+ - without making temporary copies
+
+ - modifying the contents in flight
+
+ - without touching the originals
+
+To those working in the virtualization space, all the techniques I'm
+going to describe will be quite well-known and obvious.  But I see
+other projects making the same mistakes over and over.
 
 
 Simple copying
@@ -47,7 +56,7 @@ COMMAND:
 
   qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
                      -m 2048 -display none \
-                     -drive file=ssh://kool/mnt/scratch/fedora-33.img,format=raw,if=virtio \
+                     -drive file=ssh://kool/mnt/scratch/pipes/fedora-33.img,format=raw,if=virtio \
                      -serial stdio
 
 
@@ -59,15 +68,15 @@ DIAGRAM:
          ssh
   file -------> snapshot ------> qemu
 
-That command opens the remote file for writes.  If we want to prevent
-modifications to the remote file, we can place a snapshot into the
-path.  A snapshot in this case is a qcow2 file with the backing file
-set to the SSH URL.  Any modifications we make are saved into the
-snapshot.
+The command I just showed you opened the remote file for writes.  If
+we want to prevent modifications to the remote file, we can place a
+snapshot into the path.  A snapshot in this case is a qcow2 file with
+the backing file set to the SSH URL.  Any modifications we make are
+saved into the snapshot.  The original disk is untouched.
 
 COMMAND:
 
-  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/fedora-33.img snapshot.qcow2
+  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
   qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
                      -m 2048 -display none \
                      -drive file=snapshot.qcow2,format=qcow2,if=virtio \
@@ -81,11 +90,12 @@ Instead of booting the disk, let's make a full local copy:
 
 COMMAND:
 
-  qemu-img convert -f qcow2 snapshot.qcow2 -O raw disk.img -p
+  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
 
 
 
-Sparsification
+Disk images
 ----------------------------------------------------------------------
 
 DIAGRAM:
@@ -96,9 +106,8 @@ Now let's take side-step to talk about what's inside disk images.
 
 Firstly disk images are often HUGE.  A BluRay movie is 50 gigabytes,
 but that's really nothing compared to the larger disk images that we
-move about when we do "lift and shift" of workloads from foreign
-hypervisors to KVM.  Those can be hundreds of gigabytes or terabytes,
-and we move hundreds of them in a single batch.
+move about when we move to KVM.  Those can be hundreds of gigabytes or
+terabytes, and we move hundreds of them in a single batch.
 
 But the good news is that these disk images are often quite sparse.
 They may contain much less actual data than the virtual size.
@@ -109,10 +118,298 @@ long time accumulate lots of deleted files and other stuff that isn't
 needed by the operating system but also isn't zeroes.
 
 
-We can cope with both of these things.  The technique
-is called "sparsification".
+Disk metadata
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
+   < allocated >    < allocated         > < hole     >
+              < hole >
+
+What a lot of people don't know about disk images is there's another
+part to them - the metadata.  This records which parts of the disk
+image are allocated, and while parts are "holes".
+
+Because less-experienced system administrators don't know about this,
+the metadata often gets lost when files are copied around.
+
+For long-running virtual machines, deleted data may often still be
+allocated (although this depends on how the VM is set up).
+
+Some tools you can use to study the metadata of files:
+
+  ls -lsh
+  filefrag
+  qemu-img map
+  nbdinfo --map
+
+
+Sparsification
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
+   < allocated >    < allocated         > < hole     >
+              < hole >
+
+           |
+           v
+
+  [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
+   < allocated >             < allocated >< hole     >
+              < hole         >
+
+
+We can cope with both of these things.  The technique is called
+"sparsification".  Some tools you can use to sparsify a disk are:
+
+  fstrim
+  virt-sparsify --in-place
+
+Sparsification part 2
+----------------------------------------------------------------------
+
+         ssh
+  file -------> snapshot  <------ virt-sparsify
+                          ------> qemu-img convert
+                  ^
+                  |
+      zero clusters are saved in here
+
+I'm going to take the same scenario as before, but use
+sparsification before doing the copy.
+
+(Run these commands and show the output and ls of the snapshot)
 
-  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/fedora-33.img snapshot.qcow2
+
+
+Benchmark A
+----------------------------------------------------------------------
+
+Now you might think this is all a bit obscure, but is it any good?
+In this first benchmark, I've compared copying a disk in several
+different ways to see which is fastest.  All of the copying happens
+between two idle machines, over a slow network.
+
+The full methodology is in the background notes that accompany this
+talk, which I'll link at the end.
+
+  scp             scp remote:fedora-33.img local.img
+
+                       ssh
+  sparsify        file -> qcow2 snapshot <- virt-sparsify
+                                         -> qemu-img convert
+
+  without         (as above but without sparsifying)
+  sparsify
+
+                       ssh
+  nbdcopy         file -> nbdkit cow filter <- virt-sparsify
+                                            -> nbdcopy
+
+Which do you think will be faster?
+
+
+Benchmark A results
+----------------------------------------------------------------------
+
+(Same slides with timings added)
+
+Lock contention in the cow filter is thought to be the
+reason for the poor performance of nbdkit + nbdcopy.
+
+
+Opening OVA files
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  guest.ova -----> tar-filter <- virt-sparsify
+                              -> qemu-img convert
+
+  guest.ova------------+
+  | guest.ovf          |
+  | disk1.raw|vmdk     |
+  +--------------------+
+
+  tar file =  header - file - header - file - ...
+
+This technique isn't just useful for remote files.  Another trick we
+use in virt-v2v is using an nbdkit filter to unpack VMware's OVA files
+without any copies.  OVA files are really uncompressed tar files.  The
+disk inside can be in a variety of formats, often raw or VMDK.
+
+We can ask the 'tar' command to give us the offset and size of the
+disk image within the file and simply read it out of the file
+directly.
+
+
+Benchmark B
+----------------------------------------------------------------------
+
+  cp test.ova test2.ova
+
+  tar xf test.ova fedora-33.img
+
+
+   nbdkit -> tar filter <- sparsify
+                        -> qemu-img convert
+
+  nbdkit -f --exit-with-parent --filter=tar file test.ova tar-entry=fedora-33.img
+  qemu-img create -f qcow2 -b nbd:localhost:10809 snapshot.qcow2
   virt-sparsify --inplace snapshot.qcow2
-  ls -l snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img
+
+Which is faster?
+
+Benchmark B results
+----------------------------------------------------------------------
+
+(Same as above, with results)
+
+The results are interesting, but if you remember what we said about
+the disk format and sparsification then it shouldn't be surprising.
+
+The copy and tar commands have to churn through the entire
+disk image - zeroes and deleted files.
+
+With nbdkit, sparsification and qemu-img convert we only copy a
+fraction of the data.
+
+Note the two methods do NOT produce bit-for-bit equivalent outputs.
+Q: Is this a problem?
+A: No different from if the owner of the VM had run "fstrim".
+
+
+Modifications
+----------------------------------------------------------------------
+
+Virt-v2v doesn't only make efficient copies, it also modifies the disk
+image in flight.  Some kinds of modifications that are made:
+
+ - installing virtio drivers
+
+ - removing VMware tools
+
+ - modifying the bootloader
+
+ - rebuilding initramfs
+
+ - changing device names in /etc files
+
+ - changing the Windows registry
+
+ - (and much more)
+
+These are significant modifications, and they happen entirely during
+the transfer, without touching the source and without making large
+temporary copies.
+
+I'm not going to talk about this in great detail because it's a very
+complex topic.  Instead I will show you a simple demonstration of a
+similar technique.
+
+DIAGRAM:
+
+  (Screenshot from https://alt.fedoraproject.org/cloud/)
+
+  HTTPS
+  -----> nbdkit-curl-plugin --> xz filter --> qcow2 snapshot
+     <-- sparsify
+     <-- deactivate cloud-init
+     <-- write a file
+     --> qemu-img convert
+
+DEMO:
+
+  nbdkit curl https://download.fedoraproject.org/pub/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.raw.xz --filter=xz
+  qemu-img create -f qcow2 -b nbd://localhost -F raw snapshot.qcow2
+  virt-sparsify --inplace snapshot.qcow2
+  virt-customize -a snapshot.qcow2 \
+                 --run-command 'systemctl disable cloud-init' \
+                 --write /hello:HELLO
+  ls -lsh snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
+  guestfish --ro -a local.img -i ll /
+
+
+Complete virt-v2v pipelines
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+         proprietary
+         transport
+  VMware -----> nbdkit ----> nbdkit ----> qcow2
+  ESXi          vddk         rate         snapshot
+                plugin       filter
+
+  qcow2    <---- sparsify
+  snapshot <---- install drivers
+           -----> qemu-img convert
+
+                    nbd          HTTPS
+  qemu-img convert ----> nbdkit  -----> imageio
+                         python
+                         plugin
+
+Discuss:
+
+ - separate input and output sides
+
+ - NBD used extensively
+
+ - very efficient and no large temporary copies
+
+ - virt-v2v may be on a separate machine
+
+ - rate filter
+
+ - many other tricks used
+
+
+
+Conclusions
+----------------------------------------------------------------------
+
+Disk image pipelines:
+
+ - efficient
+
+ - flexible
+
+ - avoid local copies
+
+ - avoid copying zeroes/sparseness/deleted data
+
+ - sparsification
+
+ - modifications in flight
+
+
+Future work / other topics
+----------------------------------------------------------------------
+
+nbdcopy vs qemu-img convert
+
+copy-on-read, bounded caches
+
+block size adjustment
+
+reading from containers
+
+stop using gzip!
+
+
+References
+----------------------------------------------------------------------
+
+http://git.annexia.org/?p=libguestfs-talks.git;a=tree;f=2021-pipelines
+
+https://gitlab.com/nbdkit
+
+https://libguestfs.org
 
+https://libguestfs.org/virt-v2v.1.html