--- /dev/null
+\documentclass[12pt,a4paper]{article}
+\usepackage[utf8x]{inputenc}
+\usepackage{parskip}
+\usepackage{hyperref}
+\usepackage{xcolor}
+\hypersetup{
+ colorlinks,
+ linkcolor={red!50!black},
+ citecolor={blue!50!black},
+ urlcolor={blue!80!black}
+}
+\usepackage{abstract}
+%\usepackage{graphicx}
+%\DeclareGraphicsExtensions{.pdf,.png,.jpg}
+\usepackage{eurosym}
+\usepackage{float}
+\floatstyle{boxed}
+\restylefloat{figure}
+\usepackage{fancyhdr}
+ \pagestyle{fancy}
+ %\fancyhead{}
+ %\fancyfoot{}
+
+\title{Better loopback mounts with NBD}
+\author{
+\large
+Richard W.M. Jones
+\normalsize Red Hat Inc.
+\normalsize \href{mailto:rjones@redhat.com}{rjones@redhat.com}
+}
+\date{February 2019}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+Loopback mounts let you mount a raw file as a device. Network Block
+Device with the nbdkit server takes this concept to the next level.
+You can mount compressed files. Create block devices from
+concatenated files. Mount esoteric formats like VMDK. NBD can also
+be used for testing: You can create giant devices for testing. Inject
+errors on demand into your block devices to test error detection and
+recovery. Add delays to make disks deliberately slow. I will also
+show you how to write block devices using shell scripts, and do
+advanced visualization of how the kernel and filesystems use block
+devices.
+\end{abstract}
+
+
+\section{Network Block Device}
+
+\textit{In the talk there will be an introduction to and history of
+ Network Block Device. I'm not reproducing that here since you can
+ read about the history in articles such as
+ \url{https://www.linuxjournal.com/article/3778}. There will also be
+ a short introduction to nbdkit, our pluggable, scriptable NBD
+ server. For now, see \url{https://github.com/libguestfs/nbdkit}. }
+
+
+\section{Loopback mounts -- simple but very limited}
+
+Loopback mounting a file is simple:
+
+\begin{verbatim}
+# truncate -s 10M /tmp/test.img
+# mke2fs -t ext2 /tmp/test.img
+# losetup -f /tmp/test.img
+# blockdev --getsize64 /dev/loop0
+10485760
+# mount /dev/loop0 /mnt
+\end{verbatim}
+
+But this talk is about all the things you \textit{cannot} do with a
+loopback mount. What if the file you want to mount is compressed?
+What if you want to concatenate several files? What if you want to
+use another type of storage instead of a file?
+
+You can't do those things with a loopback mount, but there is now an
+alternative: A loopback Network Block Device, backed by our pluggable,
+scriptable \textbf{nbdkit} server. It's just as simple to use as
+loopback mounts, but far more flexible.
+
+
+\section{Preparation}
+
+If you want to follow these examples on your own machine, you will
+need to install the \texttt{nbd-client} package (on Fedora:
+\texttt{nbd}), and the \texttt{nbdkit} server. Most examples require
+nbdkit $\geq 1.7.3$.
+
+Linux Network Block Device is in general very reliable, but there were
+unfortunately a couple bugs in the latest released version that is
+present in several Linux distributions (but fixed upstream).
+
+If your Linux distro ships with NBD 3.17, make sure it includes the
+following post-3.17 fix for kernel timeouts:
+\url{https://github.com/NetworkBlockDevice/nbd/pull/82}
+
+If your Linux distro uses kernel $<4.17$, then upgrading to
+4.17 or above is recommended.
+
+You may also need to run this command once before you start:
+
+\begin{verbatim}
+# modprobe nbd
+\end{verbatim}
+
+
+\section{Mounting xz-compressed disks}
+
+Loopback mounting a compressed disk will expose a block device
+containing the compressed data, which is not very useful.
+
+nbdkit has a couple of plugins for handling gzip and xz compressed
+disks. The xz plugin is quite efficient, allowing read-only random
+access to compressed files:
+
+\begin{verbatim}
+# nbdkit xz fedora-26.xz
+\end{verbatim}
+
+We can make a loopback mount called \texttt{/dev/nbd0} using one
+command:
+
+\begin{verbatim}
+# nbd-client -b 512 localhost 10809 /dev/nbd0
+\end{verbatim}
+
+Linux automatically creates block devices for each partition in the
+original (Fedora 26) disk image:
+
+\begin{verbatim}
+# ll /dev/nbd0<tab>
+nbd0 nbd0p1 nbd0p2 nbd0p3
+# file -bsL /dev/nbd0p3
+SGI XFS filesystem data (blksz 4096, inosz 512, v2 dirs)
+# mount /dev/nbd0p3 /mnt
+mount: /mnt: WARNING: device write-protected, mounted read-only.
+# cat /mnt/etc/redhat-release
+Fedora release 26 (Twenty Six)
+\end{verbatim}
+
+To clean up:
+
+\begin{verbatim}
+# umount /mnt
+# nbd-client -d /dev/nbd0
+# killall nbdkit
+\end{verbatim}
+
+
+\section{Creating a huge btrfs filesystem in memory}
+
+nbdkit is not limited to serving files or even to the limits of disk
+space. You can create enormous filesystems in memory:
+
+\begin{verbatim}
+# nbdkit memory size=$(( 2**63 - 1 ))
+# nbd-client -b 512 localhost 10809 /dev/nbd0
+\end{verbatim}
+
+How big is this? $2^{63}-1$ is about 8.5~billion gigabytes. If you
+were to buy that amount of disk at retail it would cost you
+\textbf{\euro~300~million}\footnote{September 2018 prices, WD Red SATA
+ drives bought on Amazon.fr}.
+
+We can partition and create a filesystem just like any other device:
+
+\begin{verbatim}
+# gdisk /dev/nbd0
+Number Start (sector) End (sector) Size Code Name
+ 1 1024 9007199254740973 8.0 EiB 8300 Linux filesystem
+# mkfs.btrfs -K /dev/nbd0p1
+# mount /dev/nbd0p1 /mnt
+]# df -h /mnt
+Filesystem Size Used Avail Use% Mounted on
+/dev/nbd0p1 8.0E 17M 8.0E 1% /mnt
+\end{verbatim}
+
+When you unmount the NBD partition and kill nbdkit, the device is
+gone, making this very useful for testing filesystems.
+
+
+\section{Concatenating files into a partitioned disk}
+
+\textit{In the talk this section will talk about creating a virtual
+ disk with a virtual partition table using the nbdkit
+ ``partitioning'' plugin.}
+
+
+\section{Mounting a VMware VMDK file}
+
+\textit{In the talk this section will talk about modifying VMware VMDK
+ files using the nbdkit ``vddk'' plugin.}
+
+
+\section{Testing a RAID array}
+
+Let's make a RAID array using in-memory block devices. But to test
+them we'll want a way to inject errors into those block devices.
+nbdkit makes this easy with its \textit{error filter}:
+
+\begin{verbatim}
+# nbdkit --filter=error memory size=1G \
+ error-file=/tmp/error0 error-rate=1 -p 10810
+# nbdkit --filter=error memory size=1G \
+ error-file=/tmp/error1 error-rate=1 -p 10811
+# nbdkit --filter=error memory size=1G \
+ error-file=/tmp/error2 error-rate=1 -p 10812
+# nbdkit --filter=error memory size=1G \
+ error-file=/tmp/error3 error-rate=1 -p 10813
+# nbdkit --filter=error memory size=1G \
+ error-file=/tmp/error4 error-rate=1 -p 10814
+# nbdkit --filter=error memory size=1G \
+ error-file=/tmp/error5 error-rate=1 -p 10815
+\end{verbatim}
+
+We can create 6 NBD devices from these:
+
+\begin{verbatim}
+# nbd-client localhost 10810 /dev/nbd0
+# nbd-client localhost 10811 /dev/nbd1
+# nbd-client localhost 10812 /dev/nbd2
+# nbd-client localhost 10813 /dev/nbd3
+# nbd-client localhost 10814 /dev/nbd4
+# nbd-client localhost 10815 /dev/nbd5
+\end{verbatim}
+
+And we can create a RAID 5 device on top:
+
+\begin{verbatim}
+# mdadm -C /dev/md0 --level=5 \
+ --raid-devices=5 --spare-devices=1 \
+ /dev/nbd{0,1,2,3,4,5}
+mdadm: Defaulting to version 1.2 metadata
+mdadm: array /dev/md0 started.
+# mkfs -t ext4 /dev/md0
+# mount /dev/md0 /mnt
+\end{verbatim}
+
+You can see we have 5 drives and 1 spare in the array:
+
+\begin{verbatim}
+# cat /proc/mdstat
+Personalities : [raid6] [raid5] [raid4]
+md0 : active raid5 nbd4[6] nbd5[5](S) nbd3[3] nbd2[2] nbd1[1] nbd0[0]
+ 4186112 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
+\end{verbatim}
+
+nbdkit's error filter is trigger by the presence of the error files
+\texttt{/tmp/error*}. By creating these files we can inject errors
+into specific devices and see how the RAID array responds.
+
+Firstly I inject errors into \texttt{/dev/nbd0}:
+
+\begin{verbatim}
+# touch /tmp/error0
+\end{verbatim}
+
+After a while the kernel notices:
+
+\begin{verbatim}
+[10804.798999] print_req_error: I/O error, dev nbd0, sector 100360
+[10804.868378] md: recovery of RAID array md0
+[10805.202631] md/raid:md0: read error corrected (8 sectors at 69928 on nbd0)
+[10810.349550] md: md0: recovery done.
+\end{verbatim}
+
+Comparing \texttt{/proc/mdstat} before and after:
+
+\begin{verbatim}
+-md0 : active raid5 nbd4[6] nbd5[5](S) nbd3[3] nbd2[2] nbd1[1] nbd0[0]
++md0 : active raid5 nbd4[6] nbd5[5] nbd3[3] nbd2[2] nbd1[1] nbd0[0](F)
+\end{verbatim}
+
+shows that the spare drive is now in use and nbd0 is marked as Failed.
+
+I can inject errors into a second drive:
+
+\begin{verbatim}
+# touch /tmp/error1
+[11039.428009] block nbd1: Other side returned error (5)
+[11039.431659] print_req_error: I/O error, dev nbd1, sector 231424
+[11039.448757] block nbd1: Other side returned error (5)
+[11039.452367] print_req_error: I/O error, dev nbd1, sector 233280
+[11084.767968] md/raid:md0: Disk failure on nbd1, disabling device.
+ md/raid:md0: Operation continuing on 4 devices.
+\end{verbatim}
+
+and now the array is operating in a degraded state. At the filesystem
+level everything is still fine.
+
+
+\section{Writing a Linux block device in shell script}
+
+\textit{nbdkit allows you to write plugins in various programming
+ languages, including shell script. In the talk I will demonstrate a
+ Linux block device being written as a shell script.}
+
+
+\section{Logging and visualization}
+
+\textit{I am planning some visualization tools that will let you see
+ exactly how a block device is being read and written during common
+ operations like filesystem creation, file allocation, fstrim, and so
+ on. The talk will end with a demonstration of these tools.}
+
+
+\end{document}