From: Richard W.M. Jones Date: Mon, 14 Sep 2015 12:42:08 +0000 (+0100) Subject: Add 2015 Red Hat QE talk. X-Git-Url: http://git.annexia.org/?p=libguestfs-talks.git;a=commitdiff_plain;h=0f314fa1cd53ba9f485d5a3253abd6276e5fe0cf;hp=0a72f4babec87fdfb046a7f0ee7d7681f5aa4a60 Add 2015 Red Hat QE talk. --- diff --git a/2015-qe/Makefile b/2015-qe/Makefile new file mode 100644 index 0000000..24b39b9 --- /dev/null +++ b/2015-qe/Makefile @@ -0,0 +1,9 @@ +slides := slide1.svg slide2.svg slide3.svg slide4.svg slide5.svg slide6.svg + +all: slides.pdf + +clean: + rm slides.pdf + +slides.pdf: $(slides) + convert $^ $@ diff --git a/2015-qe/README b/2015-qe/README new file mode 100644 index 0000000..5617ab1 --- /dev/null +++ b/2015-qe/README @@ -0,0 +1,6 @@ +This is a talk that I gave to the Red Hat internal QE team +in Sept 2015. + +Please start with --> notes.txt + +For slides --> slides.pdf diff --git a/2015-qe/notes.txt b/2015-qe/notes.txt new file mode 100644 index 0000000..c2f2c88 --- /dev/null +++ b/2015-qe/notes.txt @@ -0,0 +1,549 @@ +1. History and architecture +--------------------------- + +I want to explain first of all where libguestfs came from and +how the architecture works. + +In about 2008 it was very clear Red Hat had a lot of problems reading +and modifying disk images from virtual machines. For some disk +images, you could use tools like 'losetup' and 'kpartx' to mount them +on the host kernel. But that doesn't work in many cases, for example: + + - The disk image is not "raw format" (eg. qcow2). + + - The disk image uses LVM (because names and UUIDs can conflict + with LVM names/UUIDs used by the host or other VMs). + +It also requires root privileges, which means any program that wanted +to read a disk image would need to run as root. + +It's also insecure, since malformed disk images can exploit bugs in +the host kernel to gain root on the host (this *cannot* be protected +against using UIDs or SELinux). + +1.1 Architecture of libguestfs +------------------------------ + +Libguestfs is the solution to the above problems. + +Let's see how libguestfs works. + +[SLIDE 1] + +You'll be familiar with an ordinary Linux virtual machine. The Linux +VM runs inside a host process called "qemu". The Linux guest has a +kernel and userspace, and the qemu process translates requests from +the guest into accesses to a host disk image. The host disk image +could be stored in an ordinary file, or it could be stored in a host +logical volume, and it could be stored in several formats like raw, +qcow2, VMDK and so on. + +[SLIDE 2] + +That's an ordinary Linux VM. libguestfs uses the same technique, but +using a special VM that we call the "libguestfs appliance" or just the +"appliance". + +The appliance is a cut down, much smaller Linux operating system, +running inside qemu. It has the userspace tools like "lvm", "parted" +and so on. But it's also special because it only runs a single +process, called "guestfsd" (the guestfs daemon). It uses qemu to +access the disk image, in exactly the same way as an ordinary VM. + +What creates the appliance - and who controls it? + +[SLIDE 3] + +Libguestfs is also a C library ("/usr/lib64/libguestfs.so.0"). It is +this library that creates the appliance -- just by running qemu. The +C library also talks to the guestfs daemon over a simple command +channel, and it sends commands to it. + +Commands are things like: + + - Return a list of all the partitions ("part_list"). + + - Create a new filesystem ("mkfs"). + + - Write this data into a file ("write"). + +1.2 libguestfs approach vs others +--------------------------------- + +Some advantages of this approach: + + - We can support every qemu feature + qcow2 / ceph remote access / iscsi / NBD / compressed / sparse ... + + - We can support every filesystem that Linux kernel supports + ext4 / btrfs / xfs / NTFS / ... + + - We're using the same drivers as Linux (eg. ext4.ko), so all the + filesystem features work. + + - LVM etc. "just works" + + - It doesn't need root (because you can run qemu on the host + as any user). + + - It's secure (non-root, sVirt, libvirt containerization). + +Disadvantages: + + - Architecturally complex. + + - Slower than direct mounting. + +The main job of libguestfs is to: + + - Hide the complexity of the appliance. + + - Make it simple to use, fast, and reliable. + + - Offer a stable API to programs. + + - Offer useful tools on top for everyday tasks. + +1.3 Example +----------- + +As an example of how this would work: + +(1) Program linked to libguestfs calls "guestfs_part_list" (an API). + +(2) The library sends the "part_list" command. + +(3) The command is serialized and sent as a message from the +library to the guestfs daemon. + +(4) The daemon runs "parted -m -s -- /dev/sda unit b print" +(this is happening inside the appliance). + +(5) qemu does a lot of complicated translations - especially if the +disk image uses qcow2. That happens "by magic", we don't see it or +have to worry about it. + +(6) "parted" prints out a list of partitions, which the daemon +parses and serializes into a reply message. + +(7) The reply message is sent back to the library, which unpacks the +data and passes it back to the caller. + +You can try this for yourself. "guestfish" is a C program that links +to the libguestfs.so.0 library. It is a very thin wrapper over the +libguestfs C API -- all it does really is parse commands and print out +replies. + +$ virt-builder centos-6 + +$ guestfish + +Welcome to guestfish, the guest filesystem shell for +editing virtual machine filesystems and disk images. + +Type: 'help' for help on commands + 'man' to read the manual + 'quit' to quit the shell + +> add centos-6.img readonly:true +> run +> part-list /dev/sda +[0] = { + part_num: 1 + part_start: 1048576 + part_end: 537919487 + part_size: 536870912 +} +[1] = { + part_num: 2 + part_start: 537919488 + part_end: 1611661311 + part_size: 1073741824 +} +[2] = { + part_num: 3 + part_start: 1611661312 + part_end: 6442450943 + part_size: 4830789632 +} +> exit + +"add" [C API: guestfs_add_drive_opts] tells libguestfs to how to +construct the qemu command. It roughly translates into: + + qemu -drive file=centos-6.img,snapshot=on + +"run" [C API: guestfs_launch] is what runs the qemu command, creating +the appliance. It also sets up the message channel between the +library and the guestfs daemon. + +"part-list" [C API: guestfs_part_list] translates directly into a +message sent to the guestfs daemon. Not all commands work like this: +some are further translated by the library, and may result in many +messages being sent to the daemon, or none at all. + +1.3.1 Debugging +--------------- + +guestfish gives you a way to see the lower levels at work. Just +add the "guestfish -v -x" flags. "-x" traces all libguestfs API +calls. "-v" prints out all debug output from the library and +the appliance, which includes appliance kernel messages. + +Almost all commands take the "-v -x" flags (except virt-win-reg +for obscure historical reasons). + + + +2. More about the appliance +--------------------------- + +2.1 Running the appliance: direct vs libvirt +-------------------------------------------- + +In RHEL we try to stop people running qemu directly, and point them +towards libvirt for managing virtual machines. + +Libguestfs has the same concern: Should it run the qemu command +directly, or should it use libvirt to run the qemu command. There are +pros and cons: + + - Running qemu directly gives us the most flexibility, eg. if we + need to use a new qemu feature which libvirt doesn't support. + + - Libvirt implements extra security: SELinux (SVirt), separate + 'qemu' UID, cgroups. + + - Libvirt is a big component with many complicated moving parts, + meaning that using libvirt is less reliable. + +Over time, we have added all the features we need to libvirt. In +fact, now using libvirt we can access *more* qemu features than by +running qemu directly. However there are still reliability issues +with libvirt. + +RHEL 6: Always used the 'direct' method (running qemu directly). + +RHEL 7: Defaults to 'libvirt' method, but provides a fallback in case +users have reliability problems: + + export LIBGUESTFS_BACKEND=direct + +2.2 SELinux / sVirt +------------------- + +[SLIDE 4] + +In the ordinary case where you are hosting many virtual machines on a +single physical machine, libvirt runs all those virtual machines as +the same non-root user ("qemu:qemu"). + +Unfortunately this means that if one VM is exploited because of some +bug in qemu, it could then go on to exploit other VMs on the same +host. This is because there is no host protection between different +processes running as the same user. + +[SLIDE 5] + +SVirt prevents this using SELinux. + +What it does is it gives each VM a different SELinux label. It labels +every resource that a VM needs (like all its disk images) with that +SELinux label. And it adds SELinux policies that prevent one VM from +accessing another VM's differently-labelled resources. + +Libguestfs (when using the libvirt backend) uses the same mechanism. +It prevents an exploit from one disk image from possibly escalating to +other disk images, and is important for use cases like RHEV and +OpenStack where a single host user (eg. "vdsm") is using many +libguestfs handles at the same time. + +2.3 Creating the appliance: supermin +------------------------------------ + +I didn't talk about how the appliance is built. It's a small +Linux-based OS, but how do we make it? Is it RHEL? Is it Fedora? +(The answer: sort of, but not really). + +We have several constraints when building the appliance, which may +not be obvious: + + - Cannot compile our own kernel. It wouldn't be supported by RHEL. + + - Cannot distribute a huge, binary-only blob. It would be too large + to download, and static linking is generally forbidden in Fedora, + RHEL, and most other Linux distros. + + - Want to get bug/security fixes from the distro automatically. + +[SLIDE 6] + +The idea is that we build the appliance from the host distro. If the +host distro is RHEL 7, then we copy the programs we need from RHEL to +make the appliance. + +All of the programs and libraries ("parted", "lvm", "libc.so.6") and +the kernel and kernel modules get copied to make the appliance. If a +program on the host gets updated (eg. to fix a bug), we copy in the +new program the next time libguestfs runs. + +The appliance is created on the end-user's machine, at run time. +That's why libguestfs takes longer to run the first time you run it, +or just after you've done a "yum" command (since it rebuilds the +appliance if there are upgraded binaries). + +This is quite complex, but it is controlled by a command line program +called "supermin" ("supermin5" on RHEL 7), which you can try out: + +$ supermin --build /usr/lib64/guestfs/supermin.d \ + -o /tmp/appliance.d --format ext2 +supermin: open: /usr/bin/chfn: Permission denied * +supermin: open: /usr/bin/chsh: Permission denied + +$ ls -lh /tmp/appliance.d/ +1.2M initrd + 35 kernel -> /boot/vmlinuz-4.1.6-200.fc22.x86_64 +4.0G root + +"root" is the appliance (root disk). + +* The "Permission denied" errors are harmless in this case. We are +trying to get this changed in Fedora +[https://fedorahosted.org/fpc/ticket/467]. + +2.4 libguestfs-winsupport +------------------------- + +In RHEL 7.2, you have to install an additional package called +"libguestfs-winsupport" to enable NTFS (Windows filesystem) support. + +This relies on an upstream project called ntfs-3g which has +reverse-engineered the NTFS internals. We don't ship ntfs-3g in RHEL, +so there are no "ntfs-3g programs" that can be copied from the host. +How does it work? + +$ rpm -ql libguestfs-winsupport +/usr/lib64/guestfs/supermin.d/zz-winsupport.tar.gz + +$ zcat /usr/lib64/guestfs/supermin.d/zz-winsupport.tar.gz | tar tf - +./ +./usr/ +./usr/lib64/ +./usr/lib64/libntfs-3g.so +./usr/lib64/libntfs-3g.so.86 +./usr/lib64/libntfs-3g.so.86.0.0 +./usr/bin/ +./usr/bin/ntfsck +./usr/bin/ntfscat +[etc] + +As well as copying files from the host, supermin can also unpack a +tarball into the appliance. + +In the case of libguestfs-winsupport, we provide a tarball containing +the ntfs-3g distribution (the ntfs-3g source is supplied in the +libguestfs-winsupport source RPM). + +We only want to support customers using this for v2v and a few other +virt operations (like virt-win-reg), so there are checks in libguestfs +to stop it from being used for general filesystem access. + + + +3. Some virt tools +------------------ + +Libguestfs is a C library with a C API, and guestfish is quite a +low-level tool which basically offers direct access to the C API. + +To make things easier for end users, we built some higher level virt +tools for particular tasks. These tools link to libguestfs, and some +of them also use other libraries (libXML, libvirt directly, "qemu-img" +directly, Glance, etc.) + +There are about a dozen virt tools provided by libguestfs. Notable +tools include: + + - virt-edit: Edit a single file inside a VM. + + - virt-inspector: Inspect a disk image to find out if it contains + an operating system [see below]. + + - virt-builder: Make a new VM. + + - virt-resize: Resize a VM. + + - virt-v2v: Convert a VM from VMware/Xen to run on KVM. + +The virt commands use the libguestfs APIs, but often in ways that +would be hard / complicated for end users to do directly. For +example, virt-resize does a lot of calculations to work out how the +resized partitions should be laid out, and those calculations are too +hard for most people to do by hand. + + +3.1 Inspection +-------------- + +Quite a fundamental libguestfs API operation is called "inspection". +Many of the virt tools start with inspection: eg. virt-edit, virt-v2v. + +The basic idea is we have a disk image (eg. a qcow2 file). The disk +image comes from a virtual machine, but we don't know what operating +system is installed inside the disk image. + +Inspection lets you look at any disk image, and will try to find any +operating system(s) installed on there, and tell you interesting +things, such as: + + - The OS type, version, architecture (eg. "windows", 6.1, "x86_64"). + + - The Linux distro (eg. "centos"). + + - What applications are installed. + + - Windows drive letter mappings (eg. "C:" => "/dev/sda2"). + +Inspection is quite fundamental for V2V, because what operations we +have to perform on a guest depends on what the guest is. If it's +Windows, we have to do a completely different set of things, from a +RHEL guest. + +There is also a specific "virt-inspector" tool which just does +inspection and then presents the results as XML: + +$ virt-inspector -a /tmp/centos-6.img + + + linux + x86_64 + centos + CentOS release 6.6 (Final) + 6 + 6 + + / + /boot + + + + ConsoleKit + 0.4.1 + 3.el6 + x86_64 + + + ConsoleKit-libs + 0.4.1 + 3.el6 + x86_64 + + etc. + +3.1.1 How does inspection work? +------------------------------- + +Inspection is basically a large number of heuristics. For example: + + - If the filesystem contains a file called "/etc/centos-release" + then set the Linux distro to "centos". + + - If the filesystem contains a binary called "/bin/bash", then look + at the ELF header of that binary to find the OS architecture. + +(But way more complicated, and handling Windows too.) + +If you want the real details of how inspection works, I suggest +running virt-inspector with the -x option: + +$ virt-inspector -x -a /tmp/centos-6.img |& less + + + +4. Misc topics (if we have time) +-------------------------------- + +4.1 remote images (ceph, NBD etc) +--------------------------------- + +Qemu has multiple "block drivers". Some of those are for using +different file formats like qcow2. Others enable remote disks to be +accessed. Because libguestfs uses qemu, we get this (almost) for +free. + +To access a remote resource you can use commands like: + + guestfish -a nbd://example.com + + guestfish -a rbd://example.com/pool/disk + +In RHEL, many drivers are intentionally disabled. Also, the drivers +which are enabled are not particularly well tested at the moment. + + +4.2 parameters, environment variables +------------------------------------- + +There are a lot of parameters which control libguestfs. Some of these +are exposed as environment variables, allowing them to be set easily +outside programs. Examples: LIBGUESTFS_BACKEND (the "backend" setting), +LIBGUESTFS_DEBUG/LIBGUESTFS_TRACE (enable debugging/tracing). + +Documentation on environment variables: + + http://libguestfs.org/guestfs.3.html#environment-variables + +A question was asked about the "program" setting. + +When the C library creates a handle, it saves the name of the current +program. You can also read or change this setting: + +$ guestfish +> get-program +guestfish +> set-program virt-foo +> get-program +virt-foo + +In upstream libguestfs, this setting has no use. + +In RHEL we use it to enforce supportability requirements. + + +4.3 new features in libguestfs +------------------------------ + +Libguestfs upstream is mostly stable, but I am hoping to get a +new test framework upstream in the 1.32 cycle (next 3-4 months). + +https://www.redhat.com/archives/libguestfs/2015-August/msg00022.html + + +4.4 copy_(device|file)_to_(device|file) +--------------------------------------- + +There is a problem in the API which is that there is no difference +between "/dev/sda" meaning the disk/device, and "/dev/sda" meaning a +file called "sda" in the "/dev" directory. + +So instead of having a single 'copy' function, we need to tell +libguestfs whether you want to copy between files or devices. + + +4.5 virt-customize vs. virt-builder +----------------------------------- + +Virt-customize takes an existing disk image containing a guest that +you created somehow (maybe with virt-builder, maybe some other way), +and it runs a command such as 'yum install openssh' on it. + +Virt-builder copies a template from +http://libguestfs.org/download/builder/ (or other places), expands it, +and then runs the 'yum install' command. + +So really they are quite similar, and in fact use exactly the same +code: + +https://github.com/libguestfs/libguestfs/blob/master/customize/customize_run.ml# +L96 + diff --git a/2015-qe/slide1.svg b/2015-qe/slide1.svg new file mode 100644 index 0000000..0e019f7 --- /dev/null +++ b/2015-qe/slide1.svg @@ -0,0 +1,193 @@ + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + disk image(raw/qcow2/..) + qemu process + Linux kernel + Linux userspace(GNOME, emacs, libreoffice, etc.) + + + diff --git a/2015-qe/slide2.svg b/2015-qe/slide2.svg new file mode 100644 index 0000000..e37091e --- /dev/null +++ b/2015-qe/slide2.svg @@ -0,0 +1,252 @@ + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + disk image(raw/qcow2/..) + qemu process + Linux kernel + Cut down Linux operating system (parted, lvm, and other core tools) + + guestfsd + + + + diff --git a/2015-qe/slide3.svg b/2015-qe/slide3.svg new file mode 100644 index 0000000..8c6cfeb --- /dev/null +++ b/2015-qe/slide3.svg @@ -0,0 +1,293 @@ + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + disk image(raw/qcow2/..) + qemu process + Linux kernel + Cut down Linux operating system (parted, lvm, and other core tools) + + guestfsd + + + + /usr/lib64/libguestfs.so.0 + + virt-inspector + + + diff --git a/2015-qe/slide4.svg b/2015-qe/slide4.svg new file mode 100644 index 0000000..2c81687 --- /dev/null +++ b/2015-qe/slide4.svg @@ -0,0 +1,157 @@ + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + qemu process + VM #1 + + + qemu process + VM #2(bad VM) + + both running as UID:GID qemu:qemu + + diff --git a/2015-qe/slide5.svg b/2015-qe/slide5.svg new file mode 100644 index 0000000..96849b8 --- /dev/null +++ b/2015-qe/slide5.svg @@ -0,0 +1,172 @@ + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + qemu process + VM #1 + + + qemu process + VM #2(bad VM) + + both running as UID:GID qemu:qemu + + sVirt + + diff --git a/2015-qe/slide6.svg b/2015-qe/slide6.svg new file mode 100644 index 0000000..73d160e --- /dev/null +++ b/2015-qe/slide6.svg @@ -0,0 +1,264 @@ + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + host filesystem + /+ /bin+ /boot+ /etc+ /lib libc.so.6+ /tmp+ /usr + /usr/sbin parted + + + appliance + /+ /bin+ /boot+ /etc+ /lib libc.so.6+ /tmp+ /usr + /usr/sbin parted + + + copied + copied + (is not copied) + + diff --git a/2015-qe/slides.pdf b/2015-qe/slides.pdf new file mode 100644 index 0000000..118229d Binary files /dev/null and b/2015-qe/slides.pdf differ