3 ======================================================================
6 ----------------------------------------------------------------------
8 Virt-v2v is a project for "lifting and shifting" workloads from
9 proprietary VMware systems to open source management platforms like
10 RHV/oVirt, Open Stack and CNV/KubeVirt. To do this we have to copy
11 vast amounts of data quickly, modifying it in flight.
13 Nearly everything we have to copy is a virtual machine disk image of
14 some kind, and there are particular techniques you can use to copy
15 these very efficiently:
17 - without copying zeroes or deleted data
19 - without making temporary copies
21 - modifying the contents in flight
23 - without touching the originals
25 To those working in the virtualization space, all the techniques I'm
26 going to describe will be quite well-known and obvious. But I see
27 other projects making the same mistakes over and over.
31 ----------------------------------------------------------------------
33 Let's start with something trivial: Let's boot a virtual machine from
34 a local disk image. Normally I'd say use libvirt, but for this demo
35 I'm going to get very simple and run qemu directly.
41 qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
42 -m 2048 -display none \
43 -drive file=fedora-33.img,format=raw,if=virtio \
52 A lesser-known fact about qemu is that it contains an SSH client so
53 you can boot from a remote file over SSH:
57 qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
58 -m 2048 -display none \
59 -drive file=ssh://kool/mnt/scratch/pipes/fedora-33.img,format=raw,if=virtio \
64 ----------------------------------------------------------------------
69 file -------> snapshot ------> qemu
71 The command I just showed you opened the remote file for writes. If
72 we want to prevent modifications to the remote file, we can place a
73 snapshot into the path. A snapshot in this case is a qcow2 file with
74 the backing file set to the SSH URL. Any modifications we make are
75 saved into the snapshot. The original disk is untouched.
79 qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
80 qemu-system-x86_64 -machine accel=kvm:tcg -cpu host
81 -m 2048 -display none \
82 -drive file=snapshot.qcow2,format=qcow2,if=virtio \
87 ----------------------------------------------------------------------
89 Instead of booting the disk, let's make a full local copy:
93 qemu-img create -f qcow2 -b ssh://kool/mnt/scratch/pipes/fedora-33.img snapshot.qcow2
94 qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
99 ----------------------------------------------------------------------
103 [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
105 Now let's take side-step to talk about what's inside disk images.
107 Firstly disk images are often HUGE. A BluRay movie is 50 gigabytes,
108 but that's really nothing compared to the larger disk images that we
109 move about when we move to KVM. Those can be hundreds of gigabytes or
110 terabytes, and we move hundreds of them in a single batch.
112 But the good news is that these disk images are often quite sparse.
113 They may contain much less actual data than the virtual size.
114 A lot of the disk may be filled with zeroes.
116 But the bad news is that virtual machines that have been running for a
117 long time accumulate lots of deleted files and other stuff that isn't
118 needed by the operating system but also isn't zeroes.
122 ----------------------------------------------------------------------
126 [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
127 < allocated > < allocated > < hole >
130 What a lot of people don't know about disk images is there's another
131 part to them - the metadata. This records which parts of the disk
132 image are allocated, and while parts are "holes".
134 Because less-experienced system administrators don't know about this,
135 the metadata often gets lost when files are copied around.
137 For long-running virtual machines, deleted data may often still be
138 allocated (although this depends on how the VM is set up).
140 Some tools you can use to study the metadata of files:
149 ----------------------------------------------------------------------
153 [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
154 < allocated > < allocated > < hole >
160 [ XXXXXXXXXX .... DDDDDDDD XXXXXXXXXXXX .......... ]
161 < allocated > < allocated >< hole >
165 We can cope with both of these things. The technique is called
166 "sparsification". Some tools you can use to sparsify a disk are:
169 virt-sparsify --in-place
171 Sparsification part 2
172 ----------------------------------------------------------------------
175 file -------> snapshot <------ virt-sparsify
176 ------> qemu-img convert
179 zero clusters are saved in here
181 I'm going to take the same scenario as before, but use
182 sparsification before doing the copy.
184 (Run these commands and show the output and ls of the snapshot)
189 ----------------------------------------------------------------------
191 Now you might think this is all a bit obscure, but is it any good?
192 In this first benchmark, I've compared copying a disk in several
193 different ways to see which is fastest. All of the copying happens
194 between two idle machines, over a slow network.
196 The full methodology is in the background notes that accompany this
197 talk, which I'll link at the end.
199 scp scp remote:fedora-33.img local.img
202 sparsify file -> qcow2 snapshot <- virt-sparsify
205 without (as above but without sparsifying)
209 nbdcopy file -> nbdkit cow filter <- virt-sparsify
212 Which do you think will be faster?
216 ----------------------------------------------------------------------
218 (Same slides with timings added)
220 Lock contention in the cow filter is thought to be the
221 reason for the poor performance of nbdkit + nbdcopy.
225 ----------------------------------------------------------------------
229 guest.ova -----> tar-filter <- virt-sparsify
232 guest.ova------------+
235 +--------------------+
237 tar file = header - file - header - file - ...
239 This technique isn't just useful for remote files. Another trick we
240 use in virt-v2v is using an nbdkit filter to unpack VMware's OVA files
241 without any copies. OVA files are really uncompressed tar files. The
242 disk inside can be in a variety of formats, often raw or VMDK.
244 We can ask the 'tar' command to give us the offset and size of the
245 disk image within the file and simply read it out of the file
250 ----------------------------------------------------------------------
252 cp test.ova test2.ova
254 tar xf test.ova fedora-33.img
257 nbdkit -> tar filter <- sparsify
260 nbdkit -f --exit-with-parent --filter=tar file test.ova tar-entry=fedora-33.img
261 qemu-img create -f qcow2 -b nbd:localhost:10809 snapshot.qcow2
262 virt-sparsify --inplace snapshot.qcow2
263 qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img
268 ----------------------------------------------------------------------
270 (Same as above, with results)
272 The results are interesting, but if you remember what we said about
273 the disk format and sparsification then it shouldn't be surprising.
275 The copy and tar commands have to churn through the entire
276 disk image - zeroes and deleted files.
278 With nbdkit, sparsification and qemu-img convert we only copy a
279 fraction of the data.
281 Note the two methods do NOT produce bit-for-bit equivalent outputs.
282 Q: Is this a problem?
283 A: No different from if the owner of the VM had run "fstrim".
287 ----------------------------------------------------------------------
289 Virt-v2v doesn't only make efficient copies, it also modifies the disk
290 image in flight. Some kinds of modifications that are made:
292 - installing virtio drivers
294 - removing VMware tools
296 - modifying the bootloader
298 - rebuilding initramfs
300 - changing device names in /etc files
302 - changing the Windows registry
306 These are significant modifications, and they happen entirely during
307 the transfer, without touching the source and without making large
310 I'm not going to talk about this in great detail because it's a very
311 complex topic. Instead I will show you a simple demonstration of a
316 (Screenshot from https://alt.fedoraproject.org/cloud/)
319 -----> nbdkit-curl-plugin --> xz filter --> qcow2 snapshot
321 <-- deactivate cloud-init
327 nbdkit curl https://download.fedoraproject.org/pub/fedora/linux/releases/33/Cloud/x86_64/images/Fedora-Cloud-Base-33-1.2.x86_64.raw.xz --filter=xz
328 qemu-img create -f qcow2 -b nbd://localhost -F raw snapshot.qcow2
329 virt-sparsify --inplace snapshot.qcow2
330 virt-customize -a snapshot.qcow2 \
331 --run-command 'systemctl disable cloud-init' \
333 ls -lsh snapshot.qcow2
334 qemu-img convert -f qcow2 snapshot.qcow2 -O raw local.img -p
335 guestfish --ro -a local.img -i ll /
338 Complete virt-v2v pipelines
339 ----------------------------------------------------------------------
345 VMware -----> nbdkit ----> nbdkit ----> qcow2
346 ESXi vddk rate snapshot
350 snapshot <---- install drivers
351 -----> qemu-img convert
354 qemu-img convert ----> nbdkit -----> imageio
360 - separate input and output sides
362 - NBD used extensively
364 - very efficient and no large temporary copies
366 - virt-v2v may be on a separate machine
370 - many other tricks used
375 ----------------------------------------------------------------------
377 Disk image pipelines:
385 - avoid copying zeroes/sparseness/deleted data
389 - modifications in flight
392 Future work / other topics
393 ----------------------------------------------------------------------
395 nbdcopy vs qemu-img convert
397 copy-on-read, bounded caches
399 block size adjustment
401 reading from containers
407 ----------------------------------------------------------------------
409 http://git.annexia.org/?p=libguestfs-talks.git;a=tree;f=2021-pipelines
411 https://gitlab.com/nbdkit
413 https://libguestfs.org
415 https://libguestfs.org/virt-v2v.1.html