Update talk notes.

author Richard W.M. Jones <rjones@redhat.com>

Thu, 11 Feb 2021 13:19:31 +0000 (13:19 +0000)

committer Richard W.M. Jones <rjones@redhat.com>

Thu, 11 Feb 2021 15:11:27 +0000 (15:11 +0000)
author Richard W.M. Jones <rjones@redhat.com>
Thu, 11 Feb 2021 13:19:31 +0000 (13:19 +0000)
committer Richard W.M. Jones <rjones@redhat.com>
Thu, 11 Feb 2021 15:11:27 +0000 (15:11 +0000)
diff --git a/2021-pipelines/notes.txt b/2021-pipelines/notes.txt

index 6b2cb3b..fbdc5ad 100644 (file)
--- a/2021-pipelines/notes.txt
+++ b/2021-pipelines/notes.txt
@@ -5,20 +5,26 @@ February 15th 2021
  Introduction
  ----------------------------------------------------------------------
  
  Introduction
  ----------------------------------------------------------------------
  
-Today I'm going to talk about a topic which seems very obvious to me.
-But I work in the business of moving disk images between systems, and
-it may not be obvious to other people.
+Virt-v2v is a project for "lifting and shifting" workloads from
+proprietary VMware systems to open source management platforms like
+RHV/oVirt, Open Stack and CNV/KubeVirt.  To do this we have to copy
+vast amounts of data quickly, modifying it in flight.
  
  
-INTRO INTRO INTRO
-INVOLVE PIPELINES DIAGRAM
+Nearly everything we have to copy is a virtual machine disk image of
+some kind, and there are particular techniques you can use to copy
+these very efficiently:
  
  
-The bad:
- - creating unbounded temporary files
- - slow copies
+ - without copying zeroes or deleted data
  
  
+ - without making temporary copies
  
  
+ - modifying the contents in flight
  
  
+ - without touching the originals
  
  
+To those working in the virtualization space, all the techniques I'm
+going to describe will be quite well-known and obvious.  But I see
+other projects making the same mistakes over and over.
  
  
  Simple copying
  
  
  Simple copying
@@ -50,7 +56,7 @@ COMMAND:
  
    qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
                       -m 2048 -display none \
  
    qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
                       -m 2048 -display none \
-                     -drive file=ssh://kool/mnt/scratch/fedora-33.img,format=raw,if=virtio \
+                     -drive file=ssh://kool/mnt/scratch/pipes/fedora-33.img,format=raw,if=virtio \
                       -serial stdio
  
  
                       -serial stdio
  
  
@@ -62,15 +68,15 @@ DIAGRAM:
           ssh
    file -------> snapshot ------> qemu
  
           ssh
    file -------> snapshot ------> qemu
  
-That command opens the remote file for writes.  If we want to prevent
-modifications to the remote file, we can place a snapshot into the
-path.  A snapshot in this case is a qcow2 file with the backing file
-set to the SSH URL.  Any modifications we make are saved into the
-snapshot.
+The command I just showed you opened the remote file for writes.  If
+we want to prevent modifications to the remote file, we can place a
+snapshot into the path.  A snapshot in this case is a qcow2 file with
+the backing file set to the SSH URL.  Any modifications we make are
+saved into the snapshot.  The original disk is untouched.
  
  COMMAND:
  
  
  COMMAND:
  
-  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/fedora-33.img snapshot.qcow2
+  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
    qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
                       -m 2048 -display none \
                       -drive file=snapshot.qcow2,format=qcow2,if=virtio \
    qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
                       -m 2048 -display none \
                       -drive file=snapshot.qcow2,format=qcow2,if=virtio \
@@ -84,11 +90,12 @@ Instead of booting the disk, let's make a full local copy:
  
  COMMAND:
  
  
  COMMAND:
  
-  qemu-img convert -f qcow2 snapshot.qcow2 -O raw disk.img -p
+  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
  
  
  
  
  
  
-Sparsification
+Disk images
  ----------------------------------------------------------------------
  
  DIAGRAM:
  ----------------------------------------------------------------------
  
  DIAGRAM:
@@ -99,9 +106,8 @@ Now let's take side-step to talk about what's inside disk images.
  
  Firstly disk images are often HUGE.  A BluRay movie is 50 gigabytes,
  but that's really nothing compared to the larger disk images that we
  
  Firstly disk images are often HUGE.  A BluRay movie is 50 gigabytes,
  but that's really nothing compared to the larger disk images that we
-move about when we do "lift and shift" of workloads from foreign
-hypervisors to KVM.  Those can be hundreds of gigabytes or terabytes,
-and we move hundreds of them in a single batch.
+move about when we move to KVM.  Those can be hundreds of gigabytes or
+terabytes, and we move hundreds of them in a single batch.
  
  But the good news is that these disk images are often quite sparse.
  They may contain much less actual data than the virtual size.
  
  But the good news is that these disk images are often quite sparse.
  They may contain much less actual data than the virtual size.
@@ -112,10 +118,308 @@ long time accumulate lots of deleted files and other stuff that isn't
  needed by the operating system but also isn't zeroes.
  
  
  needed by the operating system but also isn't zeroes.
  
  
+Meta-data
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
+   < allocated >    < allocated         > < hole     >
+              < hole >
+
+What a lot of people don't know about disk images is there's another
+part to them - the metadata.  This records which parts of the disk
+image are allocated, and while parts are "holes".
+
+Because less-experienced system administrators don't know about this,
+the metadata often gets lost when files are copied around.
+
+For long-running virtual machines, deleted data may often still be
+allocated (although this depends on how the VM is set up).
+
+Some tools you can use to study the metadata of files:
+
+  ls -lsh
+  filefrag
+  qemu-img map
+  nbdinfo --map
+
+
+Sparsification
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
+   < allocated >    < allocated         > < hole     >
+              < hole >
+
+           |
+           v
+
+  [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
+   < allocated >             < allocated >< hole     >
+              < hole         >
+
+
  We can cope with both of these things.  The technique is called
  We can cope with both of these things.  The technique is called
-"sparsification", and it's similar to the "fstrim" command.
+"sparsification".  Some tools you can use to sparsify a disk are:
+
+  fstrim
+  virt-sparsify --in-place
+
+Sparsification part 2
+----------------------------------------------------------------------
+
+         ssh
+  file -------> snapshot  <------ virt-sparsify
+                          ------> qemu-img convert
+                  ^
+                  |
+      zero clusters are saved in here
+
+I'm going to take the same scenario as before, but use
+sparsification before doing the copy.
+
+(Run these commands and show the output and ls of the snapshot)
+
+
+
+Benchmark A
+----------------------------------------------------------------------
+
+Now you might think this is all a bit obscure, but how does it apply
+to copying disk images.  In this first benchmark, I've compared
+copying a disk in several different ways to see which is fastest.  All
+of the copying happens between two idle machines, over a slow network.
+
+The full methodology is in the background notes that accompany this
+talk, which I'll link at the end.
+
+  scp             scp remote:fedora-33.img local.img
+
+                       ssh
+  sparsify        file -> qcow2 snapshot <- virt-sparsify
+                                         -> qemu-img convert
+
+  without         (as above but without sparsifying)
+  sparsify
+
+                       ssh
+  nbdcopy         file -> nbdkit cow filter <- virt-sparsify
+                                            -> nbdcopy
+
+Which do you think will be faster?
+
+
+Benchmark A results
+----------------------------------------------------------------------
+
+(Same slides with timings added)
+
+Lock contention in the cow filter is thought to be the
+reason for the poor performance of nbdkit + nbdcopy.
+
+
+Opening OVA files
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  guest.ova -----> tar-filter <- virt-sparsify
+                              -> qemu-img convert
+
+  guest.ova------------+
+  | guest.ovf          |
+  | disk1.raw|vmdk     |
+  +--------------------+
+
+  tar file =  header - file - header - file - ...
+
+This technique isn't just useful for remote files.  Another trick we
+use in virt-v2v is using an nbdkit filter to unpack VMware's OVA files
+without any copies.  OVA files are really uncompressed tar files.  The
+disk inside can be in a variety of formats, often raw or VMDK.
+
+We can ask the 'tar' command to give us the offset and size of the
+disk image within the file and simply read it out of the file
+directly.
+
+
+Benchmark B
+----------------------------------------------------------------------
+
+  cp test.ova test2.ova
+
+  tar xf test.ova fedora-33.img
+
+
+   nbdkit -> tar filter <- sparsify
+                        -> qemu-img convert
+
+  nbdkit -f --exit-with-parent --filter=tar file test.ova tar-entry=fedora-33.img
+  qemu-img create -f qcow2 -b nbd:localhost:10809 snapshot.qcow2
+  virt-sparsify --inplace snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img
+
+Which is faster?
+
+Benchmark B results
+----------------------------------------------------------------------
+
+(Same as above, with results)
+
+The results are interesting, but if you remember what we said about
+the disk format and sparsification then it shouldn't be surprising.
+
+The copy and tar commands have to churn through the entire
+disk image - zeroes and deleted files.
+
+With nbdkit, sparsification and qemu-img convert we only copy a
+fraction of the data.
+
+Note the two methods do NOT produce bit-for-bit equivalent outputs.
+Q: Is this a problem?
+A: No different from if the owner of the VM had run "fstrim".
+
+
+Modifications
+----------------------------------------------------------------------
+
+Virt-v2v doesn't only make efficient copies, it also modifies the disk
+image in flight.  Some kinds of modifications that are made:
+
+ - installing virtio drivers
+
+ - removing VMware tools
+
+ - modifying the bootloader
+
+ - rebuilding initramfs
+
+ - changing device names in /etc files
+
+ - changing the Windows registry
+
+ - (and much more)
+
+These are significant modifications, and they happen entirely during
+the transfer, without touching the source and without making large
+temporary copies.
+
+I'm not going to talk about this in great detail because it's a very
+complex topic.  Instead I will show you a simple demonstration of a
+similar technique.
+
+  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
+  virt-sparsify --inplace snapshot.qcow2
+  guestfish -a snapshot.qcow2 -i write /etc/motd 'HEY, IT WORKED!'
+  ls -lh snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
+  virt-cat -a local.img /etc/motd
+
+(Show this as a demo.  Show original untouched)
+
+
+Complete virt-v2v paths
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+         proprietary
+         transport
+  VMware -----> nbdkit ----> nbdkit ----> qcow2
+  ESXi          vddk         rate         snapshot
+                plugin       filter
+
+  qcow2    <---- sparsify
+  snapshot <---- install drivers
+           -----> qemu-img convert
+
+                    nbd          HTTPS
+  qemu-img convert ----> nbdkit  -----> imageio
+                         python
+                         plugin
+
+Discuss:
+
+ - separate input and output sides
+
+ - NBD used extensively
+
+ - very efficient and no large temporary copies
  
  
-  qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/fedora-33.img snapshot.qcow2
+ - rate filter
+
+ - many other tricks used
+
+
+Streaming and modifying a compressed image
+----------------------------------------------------------------------
+
+DIAGRAM:
+
+  (Screenshot from https://alt.fedoraproject.org/cloud/)
+
+  HTTPS
+  -----> nbdkit-curl-plugin --> xz filter --> qcow2 snapshot
+     <-- sparsify
+     <-- deactivate cloud-init
+     <-- write a file
+     --> qemu-img convert
+
+DEMO:
+
+  nbdkit curl https://download.fedoraproject.org/pub/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.raw.xz --filter=xz
+  qemu-img create -f qcow2 -b nbd://localhost -F raw snapshot.qcow2
    virt-sparsify --inplace snapshot.qcow2
    virt-sparsify --inplace snapshot.qcow2
-  ls -l snapshot.qcow2
+  virt-customize -a snapshot.qcow2 \
+                 --run-command 'systemctl disable cloud-init' \
+                 --write /hello:HELLO
+  ls -lsh snapshot.qcow2
+  qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
+  guestfish --ro -a local.img -i ll /
+
+
+Conclusions
+----------------------------------------------------------------------
+
+Disk image pipelines:
+
+ - efficient
+
+ - flexible
+
+ - avoid local copies
+
+ - avoid copying zeroes/sparseness/deleted data
+
+ - sparsification
+
+ - modifications in flight
+
+
+Future work / other topics
+----------------------------------------------------------------------
+
+nbdcopy vs qemu-img convert
+
+copy-on-read, bounded caches
+
+block size adjustment
+
+reading from containers
+
+stop using gzip!
+
+
+References
+----------------------------------------------------------------------
+
+http://git.annexia.org/?p=libguestfs-talks.git;a=tree;f=2021-pipelines
+
+https://gitlab.com/nbdkit
+
+https://libguestfs.org
  
  
+https://libguestfs.org/virt-v2v.1.html
author	Richard W.M. Jones <rjones@redhat.com>
	Thu, 11 Feb 2021 13:19:31 +0000 (13:19 +0000)
committer	Richard W.M. Jones <rjones@redhat.com>
	Thu, 11 Feb 2021 15:11:27 +0000 (15:11 +0000)