Where to go from here: Adding resize to NBD [About 4-5 mins] - based somewhat on https://lists.debian.org/nbd/2017/01/msg00016.html * Heading: Resize: where getting bigger is better - 8000- slide - qemu -> (raw) -> qemu-nbd -> (qcow2) -> image.qcow2 - qemu -> (qcow2) -> qemu-nbd -> (raw) -> image.qcow2 XXX With all the things we've added to NBD, what do we want to add next? Our biggest goal (pardon the pun) is to allow dynamic growth of image sizes. There are two ways to consume qcow2 images over NBD. In the first, the server reads the qcow2 file and exposes only the raw guest-visible content to the client. If the guest writes a lot, the server may grow the .qcow2 file as needed, but the guest cannot change the size of the guest-visible address range, and cannot access any qcow2 features such as backing files, dirty bitmaps, or internal snapshots. In the second, the server exposes the qcow2 file as-is, and the client must then parse that metadata into guest content. The client now has access to all qcow2 features (including the QMP block_resize command for altering the size reported to the guest). However, it cannot change the size of the underlying .qcow2 container; if more guest writes and metadata actions occur than the original server size supports, the operation fails with ENOSPC. Use of preallocation can work around this limitation, but it is painful enough to pre-size things correctly that current documentation recommends always running in the first mode (raw over the wire) rather than this mode (qcow2 over the wire). The next few slides will discuss design tradeoffs to be considered when adding a resize extension. * Heading: Automatic or explicit - 8100- slide - automatic: NBD_CMD_WRITE past EOF -> server auto-resizes if possible - explicit: NBD_CMD_WITE past EOF fails, NBD_CMD_RESIZE to update, NBD_CMD_WRITE now succeeds. POSIX files support automatic growth, insofar as the underlying file system still has room. However, block devices do not. Should NBD require an explicit NBD_CMD_RESIZE before allowing access to additional size, or can NBD_CMD_WRITE extending past EOF trigger an automatic resize? Should we guarantee zero contents, or may a server to have unspecified contents in not-yet-written offsets added by a resize? If resize can be automatic, should the server advertise this capability to the client? Or should automatic resize be something the client must opt in to using? * Heading: Simple or structured - 8200- slide - simple: NBD_CMD_RESIZE -> simple reply - structured: NBD_CMD_RESIZE -> NBD_REPLY_CHUNK_SIZE+DONE Sometimes, the client knows when it needs more space, and wants to inform the server about a new requested size (this includes the case when resize is automatic). But even when the client requests one size, the server may pick a different one (due to rounding to granularities or to quotas). In other setups, the server can't resize on the fly at the request of the client, but can be resized by other means and will thus need a way for the client to learn whether the size has changed. However, returning the server's notion of the current size requires a structured reply; servers that lack structured replies would be limited to a boolean success or failure result. Is it worth requiring structured replies to implement a resize command? * Heading: Polling or notification - 8300- slide - NBD_CMD_RESIZE(FLAG_NOTIFY) -> NBD_REPLY_CHUNK_RESIZE+NOT_DONE -> NBD_REPLY_CHUNK_RESIZE+NOT_DONE ... If resize is automatic, or if the server supports external means for resizing, the client will want some way to learn the server's current size. The NBD protocol currently requires that all traffic be command/response pairs initiated by the client, with no means for the server to initiate a message unrequested by the client. However, as just mentioned, getting a size back would already require a structured reply, and structured replies allow the server to send back more than one response before declaring the response complete. Is it worth setting up a command flag where the client can request subsequent notification of size changes as an open-ended request (perhaps good-until-canceled), where the server can then send replies to that command as needed on each size change, to allow the client to have a means to receive events rather than having to periodically poll for size changes? Do we need to think about a client having to prevent against a denial of service from a malicious server that sends too many responses? * Heading: Complexity tradeoffs - 8400- Should we specify all of the previous choices, with appropriate handshaking for each knob? Integration testing becomes more difficult the more knobs there are to test against. On the other hand, additional flexibility allows for more servers to support as much or as little as easily possible, which has already been proven a worthwhile model with nbdkit plugins. Requiring support for structured replies may be necessary for some features (such as server notification), but is definitely overkill for an implementation where polling is adequate. As with fast zeroes, the way forward will be to implement something that works in each of qemu, nbdkit, and libnbd, and show that they are interoperable, so that the NBD protocol specification can then document how other implementations may also interoperably add the same support. * conclusion: XXX - 9000- wrapup Thanks for your time this afternoon. We hope this has been informative, and welcome any questions at this time.