1. History and architecture
---------------------------

I want to explain first of all where libguestfs came from and
how the architecture works.

In about 2008 it was very clear Red Hat had a lot of problems reading
and modifying disk images from virtual machines.  For some disk
images, you could use tools like 'losetup' and 'kpartx' to mount them
on the host kernel.  But that doesn't work in many cases, for example:

 - The disk image is not "raw format" (eg. qcow2).

 - The disk image uses LVM (because names and UUIDs can conflict
   with LVM names/UUIDs used by the host or other VMs).

It also requires root privileges, which means any program that wanted
to read a disk image would need to run as root.

It's also insecure, since malformed disk images can exploit bugs in
the host kernel to gain root on the host (this *cannot* be protected
against using UIDs or SELinux).

1.1 Architecture of libguestfs
------------------------------

Libguestfs is the solution to the above problems.

Let's see how libguestfs works.

[SLIDE 1]

You'll be familiar with an ordinary Linux virtual machine.  The Linux
VM runs inside a host process called "qemu".  The Linux guest has a
kernel and userspace, and the qemu process translates requests from
the guest into accesses to a host disk image.  The host disk image
could be stored in an ordinary file, or it could be stored in a host
logical volume, and it could be stored in several formats like raw,
qcow2, VMDK and so on.

[SLIDE 2]

That's an ordinary Linux VM.  libguestfs uses the same technique, but
using a special VM that we call the "libguestfs appliance" or just the
"appliance".

The appliance is a cut down, much smaller Linux operating system,
running inside qemu.  It has the userspace tools like "lvm", "parted"
and so on.  But it's also special because it only runs a single
process, called "guestfsd" (the guestfs daemon).  It uses qemu to
access the disk image, in exactly the same way as an ordinary VM.

What creates the appliance - and who controls it?

[SLIDE 3]

Libguestfs is also a C library ("/usr/lib64/libguestfs.so.0").  It is
this library that creates the appliance -- just by running qemu.  The
C library also talks to the guestfs daemon over a simple command
channel, and it sends commands to it.

Commands are things like:

 - Return a list of all the partitions ("part_list").

 - Create a new filesystem ("mkfs").

 - Write this data into a file ("write").

1.2 libguestfs approach vs others
---------------------------------

Some advantages of this approach:

 - We can support every qemu feature
   qcow2 / ceph remote access / iscsi / NBD / compressed / sparse ...

 - We can support every filesystem that Linux kernel supports
   ext4 / btrfs / xfs / NTFS / ...

 - We're using the same drivers as Linux (eg. ext4.ko), so all the
   filesystem features work.

 - LVM etc. "just works"

 - It doesn't need root (because you can run qemu on the host
   as any user).

 - It's secure (non-root, sVirt, libvirt containerization).

Disadvantages:

 - Architecturally complex.

 - Slower than direct mounting.

The main job of libguestfs is to:

 - Hide the complexity of the appliance.

 - Make it simple to use, fast, and reliable.

 - Offer a stable API to programs.

 - Offer useful tools on top for everyday tasks.

1.3 Example
-----------

As an example of how this would work:

(1) Program linked to libguestfs calls "guestfs_part_list" (an API).

(2) The library sends the "part_list" command.

(3) The command is serialized and sent as a message from the
library to the guestfs daemon.

(4) The daemon runs "parted -m -s -- /dev/sda unit b print"
(this is happening inside the appliance).

(5) qemu does a lot of complicated translations - especially if the
disk image uses qcow2.  That happens "by magic", we don't see it or
have to worry about it.

(6) "parted" prints out a list of partitions, which the daemon
parses and serializes into a reply message.

(7) The reply message is sent back to the library, which unpacks the
data and passes it back to the caller.

You can try this for yourself.  "guestfish" is a C program that links
to the libguestfs.so.0 library.  It is a very thin wrapper over the
libguestfs C API -- all it does really is parse commands and print out
replies.

$ virt-builder centos-6

$ guestfish

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> add centos-6.img readonly:true
><fs> run
><fs> part-list /dev/sda
[0] = {
  part_num: 1
  part_start: 1048576
  part_end: 537919487
  part_size: 536870912
}
[1] = {
  part_num: 2
  part_start: 537919488
  part_end: 1611661311
  part_size: 1073741824
}
[2] = {
  part_num: 3
  part_start: 1611661312
  part_end: 6442450943
  part_size: 4830789632
}
><fs> exit

"add" [C API: guestfs_add_drive_opts] tells libguestfs to how to
construct the qemu command.  It roughly translates into:

  qemu -drive file=centos-6.img,snapshot=on

"run" [C API: guestfs_launch] is what runs the qemu command, creating
the appliance.  It also sets up the message channel between the
library and the guestfs daemon.

"part-list" [C API: guestfs_part_list] translates directly into a
message sent to the guestfs daemon.  Not all commands work like this:
some are further translated by the library, and may result in many
messages being sent to the daemon, or none at all.

1.3.1 Debugging
---------------

guestfish gives you a way to see the lower levels at work.  Just
add the "guestfish -v -x" flags.  "-x" traces all libguestfs API
calls.  "-v" prints out all debug output from the library and
the appliance, which includes appliance kernel messages.

Almost all commands take the "-v -x" flags (except virt-win-reg
for obscure historical reasons).


2. More about the appliance
---------------------------

2.1 Running the appliance: direct vs libvirt
--------------------------------------------

In RHEL we try to stop people running qemu directly, and point them
towards libvirt for managing virtual machines.

Libguestfs has the same concern: Should it run the qemu command
directly, or should it use libvirt to run the qemu command.  There are
pros and cons:

 - Running qemu directly gives us the most flexibility, eg. if we
   need to use a new qemu feature which libvirt doesn't support.

 - Libvirt implements extra security: SELinux (SVirt), separate
   'qemu' UID, cgroups.

 - Libvirt is a big component with many complicated moving parts,
   meaning that using libvirt is less reliable.

Over time, we have added all the features we need to libvirt.  In
fact, now using libvirt we can access *more* qemu features than by
running qemu directly.  However there are still reliability issues
with libvirt.

RHEL 6: Always used the 'direct' method (running qemu directly).

RHEL 7: Defaults to 'libvirt' method, but provides a fallback in case
users have reliability problems:

  export LIBGUESTFS_BACKEND=direct

2.2 SELinux / sVirt
-------------------

[SLIDE 4]

In the ordinary case where you are hosting many virtual machines on a
single physical machine, libvirt runs all those virtual machines as
the same non-root user ("qemu:qemu").

Unfortunately this means that if one VM is exploited because of some
bug in qemu, it could then go on to exploit other VMs on the same
host.  This is because there is no host protection between different
processes running as the same user.

[SLIDE 5]

SVirt prevents this using SELinux.

What it does is it gives each VM a different SELinux label.  It labels
every resource that a VM needs (like all its disk images) with that
SELinux label.  And it adds SELinux policies that prevent one VM from
accessing another VM's differently-labelled resources.

Libguestfs (when using the libvirt backend) uses the same mechanism.
It prevents an exploit from one disk image from possibly escalating to
other disk images, and is important for use cases like RHEV and
OpenStack where a single host user (eg. "vdsm") is using many
libguestfs handles at the same time.

2.3 Creating the appliance: supermin
------------------------------------

I didn't talk about how the appliance is built.  It's a small
Linux-based OS, but how do we make it?  Is it RHEL?  Is it Fedora?
(The answer: sort of, but not really).

We have several constraints when building the appliance, which may
not be obvious:

 - Cannot compile our own kernel.  It wouldn't be supported by RHEL.

 - Cannot distribute a huge, binary-only blob.  It would be too large
   to download, and static linking is generally forbidden in Fedora,
   RHEL, and most other Linux distros.

 - Want to get bug/security fixes from the distro automatically.

[SLIDE 6]

The idea is that we build the appliance from the host distro.  If the
host distro is RHEL 7, then we copy the programs we need from RHEL to
make the appliance.

All of the programs and libraries ("parted", "lvm", "libc.so.6") and
the kernel and kernel modules get copied to make the appliance.  If a
program on the host gets updated (eg. to fix a bug), we copy in the
new program the next time libguestfs runs.

The appliance is created on the end-user's machine, at run time.
That's why libguestfs takes longer to run the first time you run it,
or just after you've done a "yum" command (since it rebuilds the
appliance if there are upgraded binaries).

This is quite complex, but it is controlled by a command line program
called "supermin" ("supermin5" on RHEL 7), which you can try out:

$ supermin --build /usr/lib64/guestfs/supermin.d \
      -o /tmp/appliance.d --format ext2
supermin: open: /usr/bin/chfn: Permission denied  *
supermin: open: /usr/bin/chsh: Permission denied

$ ls -lh /tmp/appliance.d/
1.2M initrd
  35 kernel -> /boot/vmlinuz-4.1.6-200.fc22.x86_64
4.0G root

"root" is the appliance (root disk).

* The "Permission denied" errors are harmless in this case.  We are
trying to get this changed in Fedora
[https://fedorahosted.org/fpc/ticket/467].

2.4 libguestfs-winsupport
-------------------------

In RHEL 7.2, you have to install an additional package called
"libguestfs-winsupport" to enable NTFS (Windows filesystem) support.

This relies on an upstream project called ntfs-3g which has
reverse-engineered the NTFS internals.  We don't ship ntfs-3g in RHEL,
so there are no "ntfs-3g programs" that can be copied from the host.
How does it work?

$ rpm -ql libguestfs-winsupport
/usr/lib64/guestfs/supermin.d/zz-winsupport.tar.gz

$ zcat /usr/lib64/guestfs/supermin.d/zz-winsupport.tar.gz | tar tf -
./
./usr/
./usr/lib64/
./usr/lib64/libntfs-3g.so
./usr/lib64/libntfs-3g.so.86
./usr/lib64/libntfs-3g.so.86.0.0
./usr/bin/
./usr/bin/ntfsck
./usr/bin/ntfscat
[etc]

As well as copying files from the host, supermin can also unpack a
tarball into the appliance.

In the case of libguestfs-winsupport, we provide a tarball containing
the ntfs-3g distribution (the ntfs-3g source is supplied in the
libguestfs-winsupport source RPM).

We only want to support customers using this for v2v and a few other
virt operations (like virt-win-reg), so there are checks in libguestfs
to stop it from being used for general filesystem access.


3. Some virt tools
------------------

Libguestfs is a C library with a C API, and guestfish is quite a
low-level tool which basically offers direct access to the C API.

To make things easier for end users, we built some higher level virt
tools for particular tasks.  These tools link to libguestfs, and some
of them also use other libraries (libXML, libvirt directly, "qemu-img"
directly, Glance, etc.)

There are about a dozen virt tools provided by libguestfs.  Notable
tools include:

 - virt-edit: Edit a single file inside a VM.

 - virt-inspector: Inspect a disk image to find out if it contains
   an operating system [see below].

 - virt-builder: Make a new VM.

 - virt-resize: Resize a VM.

 - virt-v2v: Convert a VM from VMware/Xen to run on KVM.

The virt commands use the libguestfs APIs, but often in ways that
would be hard / complicated for end users to do directly.  For
example, virt-resize does a lot of calculations to work out how the
resized partitions should be laid out, and those calculations are too
hard for most people to do by hand.


3.1 Inspection
--------------

Quite a fundamental libguestfs API operation is called "inspection".
Many of the virt tools start with inspection: eg. virt-edit, virt-v2v.

The basic idea is we have a disk image (eg. a qcow2 file).  The disk
image comes from a virtual machine, but we don't know what operating
system is installed inside the disk image.

Inspection lets you look at any disk image, and will try to find any
operating system(s) installed on there, and tell you interesting
things, such as:

 - The OS type, version, architecture (eg. "windows", 6.1, "x86_64").

 - The Linux distro (eg. "centos").

 - What applications are installed.

 - Windows drive letter mappings (eg. "C:" => "/dev/sda2").

Inspection is quite fundamental for V2V, because what operations we
have to perform on a guest depends on what the guest is.  If it's
Windows, we have to do a completely different set of things, from a
RHEL guest.

There is also a specific "virt-inspector" tool which just does
inspection and then presents the results as XML:

$ virt-inspector -a /tmp/centos-6.img 
<operatingsystems>
  <operatingsystem>
    <name>linux</name>
    <arch>x86_64</arch>
    <distro>centos</distro>
    <product_name>CentOS release 6.6 (Final)</product_name>
    <major_version>6</major_version>
    <minor_version>6</minor_version>
    <mountpoints>
      <mountpoint dev="/dev/sda3">/</mountpoint>
      <mountpoint dev="/dev/sda1">/boot</mountpoint>
    </mountpoints>
    <applications>
      <application>
        <name>ConsoleKit</name>
        <version>0.4.1</version>
        <release>3.el6</release>
        <arch>x86_64</arch>
      </application>
      <application>
        <name>ConsoleKit-libs</name>
        <version>0.4.1</version>
        <release>3.el6</release>
        <arch>x86_64</arch>
      </application>
  etc.

3.1.1 How does inspection work?
-------------------------------

Inspection is basically a large number of heuristics.  For example:

 - If the filesystem contains a file called "/etc/centos-release"
   then set the Linux distro to "centos".

 - If the filesystem contains a binary called "/bin/bash", then look
   at the ELF header of that binary to find the OS architecture.

(But way more complicated, and handling Windows too.)

If you want the real details of how inspection works, I suggest
running virt-inspector with the -x option:

$ virt-inspector -x -a /tmp/centos-6.img |& less


4. Misc topics (if we have time)
--------------------------------

4.1 remote images (ceph, NBD etc)
---------------------------------

Qemu has multiple "block drivers".  Some of those are for using
different file formats like qcow2.  Others enable remote disks to be
accessed.  Because libguestfs uses qemu, we get this (almost) for
free.

To access a remote resource you can use commands like:

  guestfish -a nbd://example.com

  guestfish -a rbd://example.com/pool/disk

In RHEL, many drivers are intentionally disabled.  Also, the drivers
which are enabled are not particularly well tested at the moment.


4.2 parameters, environment variables
-------------------------------------

There are a lot of parameters which control libguestfs.  Some of these
are exposed as environment variables, allowing them to be set easily
outside programs.  Examples: LIBGUESTFS_BACKEND (the "backend" setting),
LIBGUESTFS_DEBUG/LIBGUESTFS_TRACE (enable debugging/tracing).

Documentation on environment variables:

  http://libguestfs.org/guestfs.3.html#environment-variables

A question was asked about the "program" setting.

When the C library creates a handle, it saves the name of the current
program.  You can also read or change this setting:

$ guestfish 
><fs> get-program 
guestfish
><fs> set-program virt-foo
><fs> get-program 
virt-foo

In upstream libguestfs, this setting has no use.

In RHEL we use it to enforce supportability requirements.


4.3 new features in libguestfs
------------------------------

Libguestfs upstream is mostly stable, but I am hoping to get a
new test framework upstream in the 1.32 cycle (next 3-4 months).

https://www.redhat.com/archives/libguestfs/2015-August/msg00022.html


4.4 copy_(device|file)_to_(device|file)
---------------------------------------

There is a problem in the API which is that there is no difference
between "/dev/sda" meaning the disk/device, and "/dev/sda" meaning a
file called "sda" in the "/dev" directory.

So instead of having a single 'copy' function, we need to tell
libguestfs whether you want to copy between files or devices.


4.5 virt-customize vs. virt-builder
-----------------------------------

Virt-customize takes an existing disk image containing a guest that
you created somehow (maybe with virt-builder, maybe some other way),
and it runs a command such as 'yum install openssh' on it.

Virt-builder copies a template from
http://libguestfs.org/download/builder/ (or other places), expands it,
and then runs the 'yum install' command.

So really they are quite similar, and in fact use exactly the same
code:

https://github.com/libguestfs/libguestfs/blob/master/customize/customize_run.ml#
L96