1 1. History and architecture
2 ---------------------------
4 I want to explain first of all where libguestfs came from and
5 how the architecture works.
7 In about 2008 it was very clear Red Hat had a lot of problems reading
8 and modifying disk images from virtual machines. For some disk
9 images, you could use tools like 'losetup' and 'kpartx' to mount them
10 on the host kernel. But that doesn't work in many cases, for example:
12 - The disk image is not "raw format" (eg. qcow2).
14 - The disk image uses LVM (because names and UUIDs can conflict
15 with LVM names/UUIDs used by the host or other VMs).
17 It also requires root privileges, which means any program that wanted
18 to read a disk image would need to run as root.
20 It's also insecure, since malformed disk images can exploit bugs in
21 the host kernel to gain root on the host (this *cannot* be protected
22 against using UIDs or SELinux).
24 1.1 Architecture of libguestfs
25 ------------------------------
27 Libguestfs is the solution to the above problems.
29 Let's see how libguestfs works.
33 You'll be familiar with an ordinary Linux virtual machine. The Linux
34 VM runs inside a host process called "qemu". The Linux guest has a
35 kernel and userspace, and the qemu process translates requests from
36 the guest into accesses to a host disk image. The host disk image
37 could be stored in an ordinary file, or it could be stored in a host
38 logical volume, and it could be stored in several formats like raw,
39 qcow2, VMDK and so on.
43 That's an ordinary Linux VM. libguestfs uses the same technique, but
44 using a special VM that we call the "libguestfs appliance" or just the
47 The appliance is a cut down, much smaller Linux operating system,
48 running inside qemu. It has the userspace tools like "lvm", "parted"
49 and so on. But it's also special because it only runs a single
50 process, called "guestfsd" (the guestfs daemon). It uses qemu to
51 access the disk image, in exactly the same way as an ordinary VM.
53 What creates the appliance - and who controls it?
57 Libguestfs is also a C library ("/usr/lib64/libguestfs.so.0"). It is
58 this library that creates the appliance -- just by running qemu. The
59 C library also talks to the guestfs daemon over a simple command
60 channel, and it sends commands to it.
62 Commands are things like:
64 - Return a list of all the partitions ("part_list").
66 - Create a new filesystem ("mkfs").
68 - Write this data into a file ("write").
70 1.2 libguestfs approach vs others
71 ---------------------------------
73 Some advantages of this approach:
75 - We can support every qemu feature
76 qcow2 / ceph remote access / iscsi / NBD / compressed / sparse ...
78 - We can support every filesystem that Linux kernel supports
79 ext4 / btrfs / xfs / NTFS / ...
81 - We're using the same drivers as Linux (eg. ext4.ko), so all the
82 filesystem features work.
84 - LVM etc. "just works"
86 - It doesn't need root (because you can run qemu on the host
89 - It's secure (non-root, sVirt, libvirt containerization).
93 - Architecturally complex.
95 - Slower than direct mounting.
97 The main job of libguestfs is to:
99 - Hide the complexity of the appliance.
101 - Make it simple to use, fast, and reliable.
103 - Offer a stable API to programs.
105 - Offer useful tools on top for everyday tasks.
110 As an example of how this would work:
112 (1) Program linked to libguestfs calls "guestfs_part_list" (an API).
114 (2) The library sends the "part_list" command.
116 (3) The command is serialized and sent as a message from the
117 library to the guestfs daemon.
119 (4) The daemon runs "parted -m -s -- /dev/sda unit b print"
120 (this is happening inside the appliance).
122 (5) qemu does a lot of complicated translations - especially if the
123 disk image uses qcow2. That happens "by magic", we don't see it or
124 have to worry about it.
126 (6) "parted" prints out a list of partitions, which the daemon
127 parses and serializes into a reply message.
129 (7) The reply message is sent back to the library, which unpacks the
130 data and passes it back to the caller.
132 You can try this for yourself. "guestfish" is a C program that links
133 to the libguestfs.so.0 library. It is a very thin wrapper over the
134 libguestfs C API -- all it does really is parse commands and print out
137 $ virt-builder centos-6
141 Welcome to guestfish, the guest filesystem shell for
142 editing virtual machine filesystems and disk images.
144 Type: 'help' for help on commands
145 'man' to read the manual
146 'quit' to quit the shell
148 ><fs> add centos-6.img readonly:true
150 ><fs> part-list /dev/sda
159 part_start: 537919488
161 part_size: 1073741824
165 part_start: 1611661312
167 part_size: 4830789632
171 "add" [C API: guestfs_add_drive_opts] tells libguestfs to how to
172 construct the qemu command. It roughly translates into:
174 qemu -drive file=centos-6.img,snapshot=on
176 "run" [C API: guestfs_launch] is what runs the qemu command, creating
177 the appliance. It also sets up the message channel between the
178 library and the guestfs daemon.
180 "part-list" [C API: guestfs_part_list] translates directly into a
181 message sent to the guestfs daemon. Not all commands work like this:
182 some are further translated by the library, and may result in many
183 messages being sent to the daemon, or none at all.
188 guestfish gives you a way to see the lower levels at work. Just
189 add the "guestfish -v -x" flags. "-x" traces all libguestfs API
190 calls. "-v" prints out all debug output from the library and
191 the appliance, which includes appliance kernel messages.
193 Almost all commands take the "-v -x" flags (except virt-win-reg
194 for obscure historical reasons).
198 2. More about the appliance
199 ---------------------------
201 2.1 Running the appliance: direct vs libvirt
202 --------------------------------------------
204 In RHEL we try to stop people running qemu directly, and point them
205 towards libvirt for managing virtual machines.
207 Libguestfs has the same concern: Should it run the qemu command
208 directly, or should it use libvirt to run the qemu command. There are
211 - Running qemu directly gives us the most flexibility, eg. if we
212 need to use a new qemu feature which libvirt doesn't support.
214 - Libvirt implements extra security: SELinux (SVirt), separate
217 - Libvirt is a big component with many complicated moving parts,
218 meaning that using libvirt is less reliable.
220 Over time, we have added all the features we need to libvirt. In
221 fact, now using libvirt we can access *more* qemu features than by
222 running qemu directly. However there are still reliability issues
225 RHEL 6: Always used the 'direct' method (running qemu directly).
227 RHEL 7: Defaults to 'libvirt' method, but provides a fallback in case
228 users have reliability problems:
230 export LIBGUESTFS_BACKEND=direct
237 In the ordinary case where you are hosting many virtual machines on a
238 single physical machine, libvirt runs all those virtual machines as
239 the same non-root user ("qemu:qemu").
241 Unfortunately this means that if one VM is exploited because of some
242 bug in qemu, it could then go on to exploit other VMs on the same
243 host. This is because there is no host protection between different
244 processes running as the same user.
248 SVirt prevents this using SELinux.
250 What it does is it gives each VM a different SELinux label. It labels
251 every resource that a VM needs (like all its disk images) with that
252 SELinux label. And it adds SELinux policies that prevent one VM from
253 accessing another VM's differently-labelled resources.
255 Libguestfs (when using the libvirt backend) uses the same mechanism.
256 It prevents an exploit from one disk image from possibly escalating to
257 other disk images, and is important for use cases like RHEV and
258 OpenStack where a single host user (eg. "vdsm") is using many
259 libguestfs handles at the same time.
261 2.3 Creating the appliance: supermin
262 ------------------------------------
264 I didn't talk about how the appliance is built. It's a small
265 Linux-based OS, but how do we make it? Is it RHEL? Is it Fedora?
266 (The answer: sort of, but not really).
268 We have several constraints when building the appliance, which may
271 - Cannot compile our own kernel. It wouldn't be supported by RHEL.
273 - Cannot distribute a huge, binary-only blob. It would be too large
274 to download, and static linking is generally forbidden in Fedora,
275 RHEL, and most other Linux distros.
277 - Want to get bug/security fixes from the distro automatically.
281 The idea is that we build the appliance from the host distro. If the
282 host distro is RHEL 7, then we copy the programs we need from RHEL to
285 All of the programs and libraries ("parted", "lvm", "libc.so.6") and
286 the kernel and kernel modules get copied to make the appliance. If a
287 program on the host gets updated (eg. to fix a bug), we copy in the
288 new program the next time libguestfs runs.
290 The appliance is created on the end-user's machine, at run time.
291 That's why libguestfs takes longer to run the first time you run it,
292 or just after you've done a "yum" command (since it rebuilds the
293 appliance if there are upgraded binaries).
295 This is quite complex, but it is controlled by a command line program
296 called "supermin" ("supermin5" on RHEL 7), which you can try out:
298 $ supermin --build /usr/lib64/guestfs/supermin.d \
299 -o /tmp/appliance.d --format ext2
300 supermin: open: /usr/bin/chfn: Permission denied *
301 supermin: open: /usr/bin/chsh: Permission denied
303 $ ls -lh /tmp/appliance.d/
305 35 kernel -> /boot/vmlinuz-4.1.6-200.fc22.x86_64
308 "root" is the appliance (root disk).
310 * The "Permission denied" errors are harmless in this case. We are
311 trying to get this changed in Fedora
312 [https://fedorahosted.org/fpc/ticket/467].
314 2.4 libguestfs-winsupport
315 -------------------------
317 In RHEL 7.2, you have to install an additional package called
318 "libguestfs-winsupport" to enable NTFS (Windows filesystem) support.
320 This relies on an upstream project called ntfs-3g which has
321 reverse-engineered the NTFS internals. We don't ship ntfs-3g in RHEL,
322 so there are no "ntfs-3g programs" that can be copied from the host.
325 $ rpm -ql libguestfs-winsupport
326 /usr/lib64/guestfs/supermin.d/zz-winsupport.tar.gz
328 $ zcat /usr/lib64/guestfs/supermin.d/zz-winsupport.tar.gz | tar tf -
332 ./usr/lib64/libntfs-3g.so
333 ./usr/lib64/libntfs-3g.so.86
334 ./usr/lib64/libntfs-3g.so.86.0.0
340 As well as copying files from the host, supermin can also unpack a
341 tarball into the appliance.
343 In the case of libguestfs-winsupport, we provide a tarball containing
344 the ntfs-3g distribution (the ntfs-3g source is supplied in the
345 libguestfs-winsupport source RPM).
347 We only want to support customers using this for v2v and a few other
348 virt operations (like virt-win-reg), so there are checks in libguestfs
349 to stop it from being used for general filesystem access.
356 Libguestfs is a C library with a C API, and guestfish is quite a
357 low-level tool which basically offers direct access to the C API.
359 To make things easier for end users, we built some higher level virt
360 tools for particular tasks. These tools link to libguestfs, and some
361 of them also use other libraries (libXML, libvirt directly, "qemu-img"
362 directly, Glance, etc.)
364 There are about a dozen virt tools provided by libguestfs. Notable
367 - virt-edit: Edit a single file inside a VM.
369 - virt-inspector: Inspect a disk image to find out if it contains
370 an operating system [see below].
372 - virt-builder: Make a new VM.
374 - virt-resize: Resize a VM.
376 - virt-v2v: Convert a VM from VMware/Xen to run on KVM.
378 The virt commands use the libguestfs APIs, but often in ways that
379 would be hard / complicated for end users to do directly. For
380 example, virt-resize does a lot of calculations to work out how the
381 resized partitions should be laid out, and those calculations are too
382 hard for most people to do by hand.
388 Quite a fundamental libguestfs API operation is called "inspection".
389 Many of the virt tools start with inspection: eg. virt-edit, virt-v2v.
391 The basic idea is we have a disk image (eg. a qcow2 file). The disk
392 image comes from a virtual machine, but we don't know what operating
393 system is installed inside the disk image.
395 Inspection lets you look at any disk image, and will try to find any
396 operating system(s) installed on there, and tell you interesting
399 - The OS type, version, architecture (eg. "windows", 6.1, "x86_64").
401 - The Linux distro (eg. "centos").
403 - What applications are installed.
405 - Windows drive letter mappings (eg. "C:" => "/dev/sda2").
407 Inspection is quite fundamental for V2V, because what operations we
408 have to perform on a guest depends on what the guest is. If it's
409 Windows, we have to do a completely different set of things, from a
412 There is also a specific "virt-inspector" tool which just does
413 inspection and then presents the results as XML:
415 $ virt-inspector -a /tmp/centos-6.img
420 <distro>centos</distro>
421 <product_name>CentOS release 6.6 (Final)</product_name>
422 <major_version>6</major_version>
423 <minor_version>6</minor_version>
425 <mountpoint dev="/dev/sda3">/</mountpoint>
426 <mountpoint dev="/dev/sda1">/boot</mountpoint>
430 <name>ConsoleKit</name>
431 <version>0.4.1</version>
432 <release>3.el6</release>
436 <name>ConsoleKit-libs</name>
437 <version>0.4.1</version>
438 <release>3.el6</release>
443 3.1.1 How does inspection work?
444 -------------------------------
446 Inspection is basically a large number of heuristics. For example:
448 - If the filesystem contains a file called "/etc/centos-release"
449 then set the Linux distro to "centos".
451 - If the filesystem contains a binary called "/bin/bash", then look
452 at the ELF header of that binary to find the OS architecture.
454 (But way more complicated, and handling Windows too.)
456 If you want the real details of how inspection works, I suggest
457 running virt-inspector with the -x option:
459 $ virt-inspector -x -a /tmp/centos-6.img |& less
463 4. Misc topics (if we have time)
464 --------------------------------
466 4.1 remote images (ceph, NBD etc)
467 ---------------------------------
469 Qemu has multiple "block drivers". Some of those are for using
470 different file formats like qcow2. Others enable remote disks to be
471 accessed. Because libguestfs uses qemu, we get this (almost) for
474 To access a remote resource you can use commands like:
476 guestfish -a nbd://example.com
478 guestfish -a rbd://example.com/pool/disk
480 In RHEL, many drivers are intentionally disabled. Also, the drivers
481 which are enabled are not particularly well tested at the moment.
484 4.2 parameters, environment variables
485 -------------------------------------
487 There are a lot of parameters which control libguestfs. Some of these
488 are exposed as environment variables, allowing them to be set easily
489 outside programs. Examples: LIBGUESTFS_BACKEND (the "backend" setting),
490 LIBGUESTFS_DEBUG/LIBGUESTFS_TRACE (enable debugging/tracing).
492 Documentation on environment variables:
494 http://libguestfs.org/guestfs.3.html#environment-variables
496 A question was asked about the "program" setting.
498 When the C library creates a handle, it saves the name of the current
499 program. You can also read or change this setting:
504 ><fs> set-program virt-foo
508 In upstream libguestfs, this setting has no use.
510 In RHEL we use it to enforce supportability requirements.
513 4.3 new features in libguestfs
514 ------------------------------
516 Libguestfs upstream is mostly stable, but I am hoping to get a
517 new test framework upstream in the 1.32 cycle (next 3-4 months).
519 https://www.redhat.com/archives/libguestfs/2015-August/msg00022.html
522 4.4 copy_(device|file)_to_(device|file)
523 ---------------------------------------
525 There is a problem in the API which is that there is no difference
526 between "/dev/sda" meaning the disk/device, and "/dev/sda" meaning a
527 file called "sda" in the "/dev" directory.
529 So instead of having a single 'copy' function, we need to tell
530 libguestfs whether you want to copy between files or devices.
533 4.5 virt-customize vs. virt-builder
534 -----------------------------------
536 Virt-customize takes an existing disk image containing a guest that
537 you created somehow (maybe with virt-builder, maybe some other way),
538 and it runs a command such as 'yum install openssh' on it.
540 Virt-builder copies a template from
541 http://libguestfs.org/download/builder/ (or other places), expands it,
542 and then runs the 'yum install' command.
544 So really they are quite similar, and in fact use exactly the same
547 https://github.com/libguestfs/libguestfs/blob/master/customize/customize_run.ml#