2019-kvm-forum/notes-08-resize

   1 Where to go from here: Adding resize to NBD
   2 [About 4-5 mins]
   3
   4 - based somewhat on https://lists.debian.org/nbd/2017/01/msg00016.html
   5
   6 * Heading: Resize: where getting bigger is better
   7 - 8000- slide
   8   - qemu -> (raw) -> qemu-nbd -> (qcow2) -> image.qcow2
   9   - qemu -> (qcow2) -> qemu-nbd -> (raw) -> image.qcow2
  10
  11 XXX With all the things we've added to NBD, what do we want to add
  12 next?  Our biggest goal (pardon the pun) is to allow dynamic growth of
  13 image sizes.
  14
  15 There are two ways to consume qcow2 images over NBD.  In the first,
  16 the server reads the qcow2 file and exposes only the raw guest-visible
  17 content to the client.  If the guest writes a lot, the server may grow
  18 the .qcow2 file as needed, but the guest cannot change the size of the
  19 guest-visible address range, and cannot access any qcow2 features such
  20 as backing files, dirty bitmaps, or internal snapshots.
  21
  22 In the second, the server exposes the qcow2 file as-is, and the client
  23 must then parse that metadata into guest content.  The client now has
  24 access to all qcow2 features (including the QMP block_resize command
  25 for altering the size reported to the guest).  However, it cannot
  26 change the size of the underlying .qcow2 container; if more guest
  27 writes and metadata actions occur than the original server size
  28 supports, the operation fails with ENOSPC.  Use of preallocation can
  29 work around this limitation, but it is painful enough to pre-size
  30 things correctly that current documentation recommends always running
  31 in the first mode (raw over the wire) rather than this mode (qcow2
  32 over the wire).
  33
  34 The next few slides will discuss design tradeoffs to be considered
  35 when adding a resize extension.
  36
  37 * Heading: Automatic or explicit
  38 - 8100- slide
  39  - automatic: NBD_CMD_WRITE past EOF -> server auto-resizes if possible
  40  - explicit: NBD_CMD_WITE past EOF fails, NBD_CMD_RESIZE to update,
  41    NBD_CMD_WRITE now succeeds.
  42
  43 POSIX files support automatic growth, insofar as the underlying file
  44 system still has room.  However, block devices do not.  Should NBD
  45 require an explicit NBD_CMD_RESIZE before allowing access to
  46 additional size, or can NBD_CMD_WRITE extending past EOF trigger an
  47 automatic resize?  Should we guarantee zero contents, or may a server
  48 to have unspecified contents in not-yet-written offsets added by a
  49 resize?  If resize can be automatic, should the server advertise this
  50 capability to the client?  Or should automatic resize be something the
  51 client must opt in to using?
  52
  53 * Heading: Simple or structured
  54 - 8200- slide
  55   - simple: NBD_CMD_RESIZE -> simple reply
  56   - structured: NBD_CMD_RESIZE -> NBD_REPLY_CHUNK_SIZE+DONE
  57
  58 Sometimes, the client knows when it needs more space, and wants to
  59 inform the server about a new requested size (this includes the case
  60 when resize is automatic).  But even when the client requests one
  61 size, the server may pick a different one (due to rounding to
  62 granularities or to quotas).  In other setups, the server can't resize
  63 on the fly at the request of the client, but can be resized by other
  64 means and will thus need a way for the client to learn whether the
  65 size has changed.  However, returning the server's notion of the
  66 current size requires a structured reply; servers that lack structured
  67 replies would be limited to a boolean success or failure result.  Is
  68 it worth requiring structured replies to implement a resize command?
  69
  70 * Heading: Polling or notification
  71 - 8300- slide
  72   - NBD_CMD_RESIZE(FLAG_NOTIFY) -> NBD_REPLY_CHUNK_RESIZE+NOT_DONE
  73     -> NBD_REPLY_CHUNK_RESIZE+NOT_DONE ...
  74
  75 If resize is automatic, or if the server supports external means for
  76 resizing, the client will want some way to learn the server's current
  77 size.  The NBD protocol currently requires that all traffic be
  78 command/response pairs initiated by the client, with no means for the
  79 server to initiate a message unrequested by the client.  However, as
  80 just mentioned, getting a size back would already require a structured
  81 reply, and structured replies allow the server to send back more than
  82 one response before declaring the response complete.  Is it worth
  83 setting up a command flag where the client can request subsequent
  84 notification of size changes as an open-ended request (perhaps
  85 good-until-canceled), where the server can then send replies to that
  86 command as needed on each size change, to allow the client to have a
  87 means to receive events rather than having to periodically poll for
  88 size changes?  Do we need to think about a client having to prevent
  89 against a denial of service from a malicious server that sends too
  90 many responses?
  91
  92 * Heading: Complexity tradeoffs
  93 - 8400-
  94
  95 Should we specify all of the previous choices, with appropriate
  96 handshaking for each knob?  Integration testing becomes more difficult
  97 the more knobs there are to test against.  On the other hand,
  98 additional flexibility allows for more servers to support as much or
  99 as little as easily possible, which has already been proven a
 100 worthwhile model with nbdkit plugins.  Requiring support for
 101 structured replies may be necessary for some features (such as server
 102 notification), but is definitely overkill for an implementation where
 103 polling is adequate.
 104
 105 As with fast zeroes, the way forward will be to implement something
 106 that works in each of qemu, nbdkit, and libnbd, and show that they are
 107 interoperable, so that the NBD protocol specification can then
 108 document how other implementations may also interoperably add the same
 109 support.
 110
 111 * conclusion: XXX
 112 - 9000- wrapup
 113
 114 Thanks for your time this afternoon.  We hope this has been
 115 informative, and welcome any questions at this time.