X-Git-Url: http://git.annexia.org/?a=blobdiff_plain;f=src%2Fguestfs.pod;h=b8b8ca22b81fcc4979c35bb4a06a267459e513c1;hb=82f5fdb0dbbc0c7b04861edeadf70c86c9342df2;hp=9fcd6ec33fbf823b2e90553feac92699d5d2dc81;hpb=35dbedb1b18157b2329e0e55d0b5355f26431814;p=libguestfs.git diff --git a/src/guestfs.pod b/src/guestfs.pod index 9fcd6ec..b8b8ca2 100644 --- a/src/guestfs.pod +++ b/src/guestfs.pod @@ -14,7 +14,6 @@ guestfs - Library for accessing and modifying virtual machine images guestfs_mount (g, "/dev/sda1", "/"); guestfs_touch (g, "/hello"); guestfs_umount (g, "/"); - guestfs_sync (g); guestfs_close (g); cc prog.c -o prog -lguestfs @@ -52,6 +51,9 @@ need enough permissions to access the disk images. Libguestfs is a large API because it can do many things. For a gentle introduction, please read the L section next. +There are also some example programs in the L +manual page. + =head1 API OVERVIEW This section provides a gentler overview of the libguestfs API. We @@ -97,11 +99,10 @@ this: * disk image. */ guestfs_touch (g, "/hello"); - - /* You only need to call guestfs_sync if you have made - * changes to the guest image. (But if you've made changes - * then you *must* sync). See also: guestfs_umount and - * guestfs_umount_all calls. + + /* This is only needed for libguestfs < 1.5.24. Since then + * it is done automatically when you close the handle. See + * discussion of autosync in this page. */ guestfs_sync (g); @@ -114,7 +115,8 @@ functions that return integers return C<-1> on error, and all functions that return pointers return C on error. See section L below for how to handle errors, and consult the documentation for each function call below to see precisely how they -return error indications. +return error indications. See L for fully worked +examples. =head2 DISK IMAGES @@ -160,27 +162,33 @@ NAMING> below. Before you can read or write files, create directories and so on in a disk image that contains filesystems, you have to mount those -filesystems using L. If you already know that a disk -image contains (for example) one partition with a filesystem on that -partition, then you can mount it directly: +filesystems using L or L. +If you already know that a disk image contains (for example) one +partition with a filesystem on that partition, then you can mount it +directly: - guestfs_mount (g, "/dev/sda1", "/"); + guestfs_mount_options (g, "", "/dev/sda1", "/"); where C means literally the first partition (C<1>) of the first disk image that we added (C). If the disk contains -Linux LVM2 logical volumes you could refer to those instead (eg. C). +Linux LVM2 logical volumes you could refer to those instead +(eg. C). Note that these are libguestfs virtual devices, +and are nothing to do with host devices. If you are given a disk image and you don't know what it contains then you have to find out. Libguestfs can do that too: use L and L to list possible partitions and LVs, and either try mounting each to see what is mountable, or else examine them with L or -L. Libguestfs also has a set of APIs for inspection of -disk images (see L below). But you might find it easier -to look at higher level programs built on top of libguestfs, in +L. To list just filesystems, use +L. + +Libguestfs also has a set of APIs for inspection of unknown disk +images (see L below). But you might find it easier to +look at higher level programs built on top of libguestfs, in particular L. -To mount a disk image read-only, use L. There are +To mount a filesystem read-only, use L. There are several other variations of the C call. =head2 FILESYSTEM ACCESS AND MODIFICATION @@ -254,10 +262,9 @@ L. =head2 DOWNLOADING -Use L to download small, text only files. This call -is limited to files which are less than 2 MB and which cannot contain -any ASCII NUL (C<\0>) characters. However it has a very simple -to use API. +Use L to download small, text only files. This call is +limited to files which are less than 2 MB and which cannot contain any +ASCII NUL (C<\0>) characters. However the API is very simple to use. L can be used to read files which contain arbitrary 8 bit data, since it returns a (pointer, size) pair. @@ -332,6 +339,27 @@ Use L. See L above. =back +=head2 UPLOADING AND DOWNLOADING TO PIPES AND FILE DESCRIPTORS + +Calls like L, L, +L, L etc appear to only take +filenames as arguments, so it appears you can only upload and download +to files. However many Un*x-like hosts let you use the special device +files C, C, C and C +to read and write from stdin, stdout, stderr, and arbitrary file +descriptor N. + +For example, L writes its output to stdout by +doing: + + guestfs_download (g, filename, "/dev/stdout"); + +and you can write tar output to a pipe C by doing: + + char devfd[64]; + snprintf (devfd, sizeof devfd, "/dev/fd/%d", fd); + guestfs_tar_out (g, "/", devfd); + =head2 LISTING FILES L is just designed for humans to read (mainly when using @@ -404,7 +432,8 @@ to their advantage. A secure alternative is to use libguestfs to install a "firstboot" script (a script which runs when the guest next boots normally), and to have this script run the commands you want in the normal context of -the running guest, network security and so on. +the running guest, network security and so on. For information about +other security issues, see L. =back @@ -521,10 +550,11 @@ device (I the underlying encrypted block device). =head2 INSPECTION Libguestfs has APIs for inspecting an unknown disk image to find out -if it contains operating systems. (These APIs used to be in a -separate Perl-only library called L but since -version 1.5.3 the most frequently used part of this library has been -rewritten in C and moved into the core code). +if it contains operating systems, an install CD or a live CD. (These +APIs used to be in a separate Perl-only library called +L but since version 1.5.3 the most frequently +used part of this library has been rewritten in C and moved into the +core code). Add all disks belonging to the unknown virtual machine and call L in the usual way. @@ -575,13 +605,36 @@ inspection and caches the results in the guest handle. Subsequent calls to C return this cached information, but I re-read the disks. If you change the content of the guest disks, you can redo inspection by calling L -again. +again. (L works a little +differently from the other calls and does read the disks. See +documentation for that function for details). + +=head3 INSPECTING INSTALL DISKS + +Libguestfs (since 1.9.4) can detect some install disks, install +CDs, live CDs and more. + +Call L to return the format of the +operating system, which currently can be C (a regular +operating system) or C (some sort of install disk). + +Further information is available about the operating system that can +be installed using the regular inspection APIs like +L, +L etc. + +Some additional information specific to installer disks is also +available from the L, +L and L +calls. =head2 SPECIAL CONSIDERATIONS FOR WINDOWS GUESTS Libguestfs can mount NTFS partitions. It does this using the L driver. +=head3 DRIVE LETTERS AND PATHS + DOS and Windows still use drive letters, and the filesystems are always treated as case insensitive by Windows itself, and therefore you might find a Windows configuration file referring to a path like @@ -599,6 +652,8 @@ outside the scope of libguestfs, but something that you can easily do. Where we can help is in resolving the case insensitivity of paths. For this, call L. +=head3 ACCESSING THE WINDOWS REGISTRY + Libguestfs also provides some help for decoding Windows Registry "hive" files, through the library C which is part of the libguestfs project although ships as a separate tarball. You have to @@ -607,15 +662,42 @@ C functions. See also the programs L, L, L and L for more help on this issue. +=head3 SYMLINKS ON NTFS-3G FILESYSTEMS + +Ntfs-3g tries to rewrite "Junction Points" and NTFS "symbolic links" +to provide something which looks like a Linux symlink. The way it +tries to do the rewriting is described here: + +L + +The essential problem is that ntfs-3g simply does not have enough +information to do a correct job. NTFS links can contain drive letters +and references to external device GUIDs that ntfs-3g has no way of +resolving. It is almost certainly the case that libguestfs callers +should ignore what ntfs-3g does (ie. don't use L on +NTFS volumes). + +Instead if you encounter a symbolic link on an ntfs-3g filesystem, use +L to read the C extended +attribute, and read the raw reparse data from that (you can find the +format documented in various places around the web). + +=head3 EXTENDED ATTRIBUTES ON NTFS-3G FILESYSTEMS + +There are other useful extended attributes that can be read from +ntfs-3g filesystems (using L). See: + +L + =head2 USING LIBGUESTFS WITH OTHER PROGRAMMING LANGUAGES Although we don't want to discourage you from using the C API, we will mention here that the same API is also available in other languages. The API is broadly identical in all supported languages. This means -that the C call C is -C<$g-Emount($path)> in Perl, C in Python, -and C in OCaml. In other words, a +that the C call C is +C<$g-Eadd_drive_ro($file)> in Perl, C in Python, +and C in OCaml. In other words, a straightforward, predictable isomorphism between each language. Error messages are automatically transformed @@ -651,11 +733,11 @@ with libguestfs. =item B -For documentation see the file C. +See L. =item B -For documentation see L. +See L. =item B @@ -666,20 +748,15 @@ The PHP binding only works correctly on 64 bit machines. =item B -For documentation do: - - $ python - >>> import guestfs - >>> help (guestfs) +See L. =item B -Use the Guestfs module. There is no Ruby-specific documentation, but -you can find examples written in Ruby in the libguestfs source. +See L. =item B -For documentation see L. +See L. =back @@ -1006,6 +1083,166 @@ UUIDs and filesystem labels. =back +=head1 SECURITY + +This section discusses security implications of using libguestfs, +particularly with untrusted or malicious guests or disk images. + +=head2 GENERAL SECURITY CONSIDERATIONS + +Be careful with any files or data that you download from a guest (by +"download" we mean not just the L command but any +command that reads files, filenames, directories or anything else from +a disk image). An attacker could manipulate the data to fool your +program into doing the wrong thing. Consider cases such as: + +=over 4 + +=item * + +the data (file etc) not being present + +=item * + +being present but empty + +=item * + +being much larger than normal + +=item * + +containing arbitrary 8 bit data + +=item * + +being in an unexpected character encoding + +=item * + +containing homoglyphs. + +=back + +=head2 SECURITY OF MOUNTING FILESYSTEMS + +When you mount a filesystem under Linux, mistakes in the kernel +filesystem (VFS) module can sometimes be escalated into exploits by +deliberately creating a malicious, malformed filesystem. These +exploits are very severe for two reasons. Firstly there are very many +filesystem drivers in the kernel, and many of them are infrequently +used and not much developer attention has been paid to the code. +Linux userspace helps potential crackers by detecting the filesystem +type and automatically choosing the right VFS driver, even if that +filesystem type is obscure or unexpected for the administrator. +Secondly, a kernel-level exploit is like a local root exploit (worse +in some ways), giving immediate and total access to the system right +down to the hardware level. + +That explains why you should never mount a filesystem from an +untrusted guest on your host kernel. How about libguestfs? We run a +Linux kernel inside a qemu virtual machine, usually running as a +non-root user. The attacker would need to write a filesystem which +first exploited the kernel, and then exploited either qemu +virtualization (eg. a faulty qemu driver) or the libguestfs protocol, +and finally to be as serious as the host kernel exploit it would need +to escalate its privileges to root. This multi-step escalation, +performed by a static piece of data, is thought to be extremely hard +to do, although we never say 'never' about security issues. + +In any case callers can reduce the attack surface by forcing the +filesystem type when mounting (use L). + +=head2 PROTOCOL SECURITY + +The protocol is designed to be secure, being based on RFC 4506 (XDR) +with a defined upper message size. However a program that uses +libguestfs must also take care - for example you can write a program +that downloads a binary from a disk image and executes it locally, and +no amount of protocol security will save you from the consequences. + +=head2 INSPECTION SECURITY + +Parts of the inspection API (see L) return untrusted +strings directly from the guest, and these could contain any 8 bit +data. Callers should be careful to escape these before printing them +to a structured file (for example, use HTML escaping if creating a web +page). + +Guest configuration may be altered in unusual ways by the +administrator of the virtual machine, and may not reflect reality +(particularly for untrusted or actively malicious guests). For +example we parse the hostname from configuration files like +C that we find in the guest, but the guest +administrator can easily manipulate these files to provide the wrong +hostname. + +The inspection API parses guest configuration using two external +libraries: Augeas (Linux configuration) and hivex (Windows Registry). +Both are designed to be robust in the face of malicious data, although +denial of service attacks are still possible, for example with +oversized configuration files. + +=head2 RUNNING UNTRUSTED GUEST COMMANDS + +Be very cautious about running commands from the guest. By running a +command in the guest, you are giving CPU time to a binary that you do +not control, under the same user account as the library, albeit +wrapped in qemu virtualization. More information and alternatives can +be found in the section L. + +=head2 CVE-2010-3851 + +https://bugzilla.redhat.com/642934 + +This security bug concerns the automatic disk format detection that +qemu does on disk images. + +A raw disk image is just the raw bytes, there is no header. Other +disk images like qcow2 contain a special header. Qemu deals with this +by looking for one of the known headers, and if none is found then +assuming the disk image must be raw. + +This allows a guest which has been given a raw disk image to write +some other header. At next boot (or when the disk image is accessed +by libguestfs) qemu would do autodetection and think the disk image +format was, say, qcow2 based on the header written by the guest. + +This in itself would not be a problem, but qcow2 offers many features, +one of which is to allow a disk image to refer to another image +(called the "backing disk"). It does this by placing the path to the +backing disk into the qcow2 header. This path is not validated and +could point to any host file (eg. "/etc/passwd"). The backing disk is +then exposed through "holes" in the qcow2 disk image, which of course +is completely under the control of the attacker. + +In libguestfs this is rather hard to exploit except under two +circumstances: + +=over 4 + +=item 1. + +You have enabled the network or have opened the disk in write mode. + +=item 2. + +You are also running untrusted code from the guest (see +L). + +=back + +The way to avoid this is to specify the expected disk format when +adding disks (the optional C option to +L). You should always do this if the disk is +raw format, and it's a good idea for other cases too. + +For disks added from libvirt using calls like L, +the format is fetched from libvirt and passed through. + +For libguestfs tools, use the I<--format> command line parameter as +appropriate. + =head1 CONNECTION MANAGEMENT =head2 guestfs_h * @@ -1694,6 +1931,14 @@ The header contains the procedure number (C) which is how the receiver knows what type of args structure to expect, or none at all. +For functions that take optional arguments, the optional arguments are +encoded in the C_args> structure in the same way as +ordinary arguments. A bitmask in the header indicates which optional +arguments are meaningful. The bitmask is also checked to see if it +contains bits set which the daemon does not know about (eg. if more +optional arguments were added in a later version of the library), and +this causes the call to be rejected. + The reply message for ordinary functions is: total length (header + ret, @@ -1868,6 +2113,354 @@ dot-oh release won't necessarily be so stable at this point, but by backporting fixes from development, that branch will stabilize over time. +=head1 EXTENDING LIBGUESTFS + +=head2 ADDING A NEW API ACTION + +Large amounts of boilerplate code in libguestfs (RPC, bindings, +documentation) are generated, and this makes it easy to extend the +libguestfs API. + +To add a new API action there are two changes: + +=over 4 + +=item 1. + +You need to add a description of the call (name, parameters, return +type, tests, documentation) to C. + +There are two sorts of API action, depending on whether the call goes +through to the daemon in the appliance, or is serviced entirely by the +library (see L above). L is an example +of the former, since the sync is done in the appliance. +L is an example of the latter, since a trace flag +is maintained in the handle and all tracing is done on the library +side. + +Most new actions are of the first type, and get added to the +C list. Each function has a unique procedure number +used in the RPC protocol which is assigned to that action when we +publish libguestfs and cannot be reused. Take the latest procedure +number and increment it. + +For library-only actions of the second type, add to the +C list. Since these functions are serviced by +the library and do not travel over the RPC mechanism to the daemon, +these functions do not need a procedure number, and so the procedure +number is set to C<-1>. + +=item 2. + +Implement the action (in C): + +For daemon actions, implement the function CnameE> in the +C directory. + +For library actions, implement the function CnameE> +(note: double underscore) in the C directory. + +In either case, use another function as an example of what to do. + +=back + +After making these changes, use C to compile. + +Note that you don't need to implement the RPC, language bindings, +manual pages or anything else. It's all automatically generated from +the OCaml description. + +=head2 ADDING TESTS FOR AN API ACTION + +You can supply zero or as many tests as you want per API call. The +tests can either be added as part of the API description +(C), or in some rarer cases you may +want to drop a script into C. Note that adding a script +to C is slower, so if possible use the first method. + +The following describes the test environment used when you add an API +test in C. + +The test environment has 4 block devices: + +=over 4 + +=item C 500MB + +General block device for testing. + +=item C 50MB + +C is an ext2 filesystem used for testing +filesystem write operations. + +=item C 10MB + +Used in a few tests where two block devices are needed. + +=item C + +ISO with fixed content (see C). + +=back + +To be able to run the tests in a reasonable amount of time, the +libguestfs appliance and block devices are reused between tests. So +don't try testing L :-x + +Each test starts with an initial scenario, selected using one of the +C expressions, described in C. +These initialize the disks mentioned above in a particular way as +documented in C. You should not assume anything +about the previous contents of other disks that are not initialized. + +You can add a prerequisite clause to any individual test. This is a +run-time check, which, if it fails, causes the test to be skipped. +Useful if testing a command which might not work on all variations of +libguestfs builds. A test that has prerequisite of C means to +run unconditionally. + +In addition, packagers can skip individual tests by setting +environment variables before running C. + + SKIP_TEST__=1 + +eg: C skips test #3 of L. + +or: + + SKIP_TEST_=1 + +eg: C skips all L tests. + +Packagers can run only certain tests by setting for example: + + TEST_ONLY="vfs_type zerofree" + +See C for more details of how these environment +variables work. + +=head2 DEBUGGING NEW API ACTIONS + +Test new actions work before submitting them. + +You can use guestfish to try out new commands. + +Debugging the daemon is a problem because it runs inside a minimal +environment. However you can fprintf messages in the daemon to +stderr, and they will show up if you use C. + +=head2 FORMATTING CODE AND OTHER CONVENTIONS + +Our C source code generally adheres to some basic code-formatting +conventions. The existing code base is not totally consistent on this +front, but we do prefer that contributed code be formatted similarly. +In short, use spaces-not-TABs for indentation, use 2 spaces for each +indentation level, and other than that, follow the K&R style. + +If you use Emacs, add the following to one of one of your start-up files +(e.g., ~/.emacs), to help ensure that you get indentation right: + + ;;; In libguestfs, indent with spaces everywhere (not TABs). + ;;; Exceptions: Makefile and ChangeLog modes. + (add-hook 'find-file-hook + '(lambda () (if (and buffer-file-name + (string-match "/libguestfs\\>" + (buffer-file-name)) + (not (string-equal mode-name "Change Log")) + (not (string-equal mode-name "Makefile"))) + (setq indent-tabs-mode nil)))) + + ;;; When editing C sources in libguestfs, use this style. + (defun libguestfs-c-mode () + "C mode with adjusted defaults for use with libguestfs." + (interactive) + (c-set-style "K&R") + (setq c-indent-level 2) + (setq c-basic-offset 2)) + (add-hook 'c-mode-hook + '(lambda () (if (string-match "/libguestfs\\>" + (buffer-file-name)) + (libguestfs-c-mode)))) + +Enable warnings when compiling (and fix any problems this +finds): + + ./configure --enable-gcc-warnings + +Useful targets are: + + make syntax-check # checks the syntax of the C code + make check # runs the test suite + +=head2 DAEMON CUSTOM PRINTF FORMATTERS + +In the daemon code we have created custom printf formatters C<%Q> and +C<%R>, which are used to do shell quoting. + +=over 4 + +=item %Q + +Simple shell quoted string. Any spaces or other shell characters are +escaped for you. + +=item %R + +Same as C<%Q> except the string is treated as a path which is prefixed +by the sysroot. + +=back + +For example: + + asprintf (&cmd, "cat %R", path); + +would produce C + +I Do I use these when you are passing parameters to the +C functions. These parameters do NOT need to be +quoted because they are not passed via the shell (instead, straight to +exec). You probably want to use the C function +however. + +=head2 SUBMITTING YOUR NEW API ACTIONS + +Submit patches to the mailing list: +L +and CC to L. + +=head2 INTERNATIONALIZATION (I18N) SUPPORT + +We support i18n (gettext anyhow) in the library. + +However many messages come from the daemon, and we don't translate +those at the moment. One reason is that the appliance generally has +all locale files removed from it, because they take up a lot of space. +So we'd have to readd some of those, as well as copying our PO files +into the appliance. + +Debugging messages are never translated, since they are intended for +the programmers. + +=head2 SOURCE CODE SUBDIRECTORIES + +=over 4 + +=item C + +The libguestfs appliance, build scripts and so on. + +=item C + +Automated tests of the C API. + +=item C + +The L, L and L commands +and documentation. + +=item C + +Outside contributions, experimental parts. + +=item C + +The daemon that runs inside the libguestfs appliance and carries out +actions. + +=item C + +L command and documentation. + +=item C + +C API example code. + +=item C + +L, the command-line shell, and various shell scripts +built on top such as L, L, +L, L. + +=item C + +L, FUSE (userspace filesystem) built on top of libguestfs. + +=item C + +The crucially important generator, used to automatically generate +large amounts of boilerplate C code for things like RPC and bindings. + +=item C + +Files used by the test suite. + +Some "phony" guest images which we test against. + +=item C + +L, the virtual machine image inspector. + +=item C + +Logo used on the website. The fish is called Arthur by the way. + +=item C + +M4 macros used by autoconf. + +=item C + +Translations of simple gettext strings. + +=item C + +The build infrastructure and PO files for translations of manpages and +POD files. Eventually this will be combined with the C directory, +but that is rather complicated. + +=item C + +Regression tests. + +=item C + +L command and documentation. + +=item C + +Source code to the C library. + +=item C + +Command line tools written in Perl (L and many others). + +=item C + +Test tool for end users to test if their qemu/kernel combination +will work with libguestfs. + +=item C + +=item C + +=item C + +=item C + +=item C + +=item C + +=item C + +=item C + +Language bindings. + +=back + =head1 ENVIRONMENT VARIABLES =over 4 @@ -1908,23 +2501,31 @@ has the same effect as calling C. =item TMPDIR -Location of temporary directory, defaults to C. +Location of temporary directory, defaults to C except for the +cached supermin appliance which defaults to C. If libguestfs was compiled to use the supermin appliance then the real appliance is cached in this directory, shared between all handles belonging to the same EUID. You can use C<$TMPDIR> to -configure another directory to use in case C is not large +configure another directory to use in case C is not large enough. =back =head1 SEE ALSO +L, +L, +L, +L, L, L, L, +L, +L, L, L, +L, L, L, L, @@ -1932,6 +2533,8 @@ L, L, L, L, +L, +L, L, L, L,