images, with support for NBD on both source and destination. However,
most guest images are sparse, and we want to avoid naively reading
lots of zeroes on the source then writing lots of zeroes on the
-destination. Here's a case study of optimizing that, starting with a
-baseline of straightline copying, which matches qemu 3.0 behavior.
-XXX Is that correct version? what date?
-Let's convert a 100M image, which alternates between data and holes at
-each megabyte.
-
-* Heading: Dissecting that command
-- 41x0 - series of html pages to highlight aspects of the ./convert command
- - nbkdit, plugin server, --run command
- - delay filter
- - blocksize filter
- - stats, log filters
- - noextents filter
- - nozero filter
-
-Okay, I went a bit fast on that ./convert command. Looking closer, it
-is using nbdkit as a server with the memory plugin as the data sink,
-tied to a single invocation of qemu-img convert as the source over a
-Unix socket. There are lots of nbdkit filters: we want to slow down
-the operation so a small disk can still be useful in performance
-testing, and where the server defaults to a zero operation that is
-faster than writes. We want to demonstrate the fact that a single
-write zero operation (with no explicit network payload) can often
-cover a larger swath of the file in one operation than writes (which
-have a maximum payload per operation), by forcing writes to split
-smaller than the 1M striping of the source image. We want to collect
-statistics, both of the overall time spent, and which operations the
-client attempted, and include a sleep to avoid an output race. For
-this case study, we disable BLOCK_STATUS with noextents (more on why
-later). And finally we use the nozero filter as our knob for what the
-server advertises to the client, as well as how it responds to various
-zero-related requests.
+destination. Here's a case study of our last three years in
+optimizing that, starting with a baseline of straightline copying,
+which matches qemu 2.7 behavior (Sep 2016). Let's convert a 100M
+image, which alternates between data and holes at each megabyte.
+
+The ./convert command show here is rather long; if you're interested
+in its origins, my patch submission in Aug 2019 goes into more
+details. But for now, just think of it as a fancy way to run
+'qemu-img convert' against a server where I can tweak server behavior
+to control which zeroing-related features are advertised or
+implemented.
* Heading: Writing zeroes: much ado about nothing
- 4200- .term
- ./convert zeromode=plugin fastzeromode=none for server A
- ./convert zeromode=emulate fastzeromode=none for server B
-XXX - verify versions/dates
In qemu 2.8 (Dec 2016), we implemented the NBD extension of
WRITE_ZEROES, with the initial goal of reducing network traffic (no
need to send an explicit payload of all zero bytes over the network).
Do we even have to worry whether WRITE_ZEROES will be fast or slow?
If we know that the destination already contains all zeroes, we could
entirely skip destination I/O for each hole in the source. qemu 2.12
-added support for NBD_CMD_BLOCK_STATUS to quickly learn whether a
-portion of a disk is a hole. But experiments with qemu-img convert
-showed that using BLOCK_STATUS as a way to avoid WRITE_ZEROES didn't
-really help, for a couple of reasons. If writing zeroes is fast,
-checking the destination first is either a mere tradeoff in commands
-(BLOCK_STATUS replacing WRITE_ZEROES when the destination is already
-zero) or a pessimization (BLOCK_STATUS still has to be followed by
-WRITE_ZEROES). And if writing zeroes is slow, we have a speedup holes
-when BLOCK_STATUS itself is fast on pre-existing destination holes,
-but we encountered situations such as tmpfs that has a linear rather
-than constant-time lseek(SEEK_HOLE) implementation, where we ended up
-with quadratic behavior all due to BLOCK_STATUS calls. Thus, for now,
-qemu-img convert does not use BLOCK_STATUS, and as mentioned earlier,
-I used the noextents filter in my test case to ensure BLOCK_STATUS is
-not interfering with timing results.
+(Apr 2018) added support for NBD_CMD_BLOCK_STATUS to quickly learn
+whether a portion of a disk is a hole. But experiments with qemu-img
+convert showed that using BLOCK_STATUS as a way to avoid WRITE_ZEROES
+didn't really help, for a couple of reasons. If writing zeroes is
+fast, checking the destination first is either a mere tradeoff in
+commands (BLOCK_STATUS replacing WRITE_ZEROES when the destination is
+already zero) or a pessimization (BLOCK_STATUS still has to be
+followed by WRITE_ZEROES). And if writing zeroes is slow, we have a
+speedup holes when BLOCK_STATUS itself is fast on pre-existing
+destination holes, but we encountered situations such as tmpfs that
+has a linear rather than constant-time lseek(SEEK_HOLE)
+implementation, where we ended up with quadratic behavior all due to
+BLOCK_STATUS calls. Thus, for now, qemu-img convert does not use
+BLOCK_STATUS, and as mentioned earlier, I used the noextents filter in
+my test case to ensure BLOCK_STATUS is not interfering with timing
+results.
* Heading: Pre-zeroing: a tale of two servers
- 4400- .term