X-Git-Url: http://git.annexia.org/?a=blobdiff_plain;f=src%2Fguestfs.pod;h=d8156b94f09709d6b8bc1f6a5bca0d4664f6ec16;hb=880374c6df2a694bb1457231f110d9ef7035e5b7;hp=9fcd6ec33fbf823b2e90553feac92699d5d2dc81;hpb=35dbedb1b18157b2329e0e55d0b5355f26431814;p=libguestfs.git diff --git a/src/guestfs.pod b/src/guestfs.pod index 9fcd6ec..d8156b9 100644 --- a/src/guestfs.pod +++ b/src/guestfs.pod @@ -14,7 +14,6 @@ guestfs - Library for accessing and modifying virtual machine images guestfs_mount (g, "/dev/sda1", "/"); guestfs_touch (g, "/hello"); guestfs_umount (g, "/"); - guestfs_sync (g); guestfs_close (g); cc prog.c -o prog -lguestfs @@ -52,6 +51,9 @@ need enough permissions to access the disk images. Libguestfs is a large API because it can do many things. For a gentle introduction, please read the L section next. +There are also some example programs in the L +manual page. + =head1 API OVERVIEW This section provides a gentler overview of the libguestfs API. We @@ -98,10 +100,9 @@ this: */ guestfs_touch (g, "/hello"); - /* You only need to call guestfs_sync if you have made - * changes to the guest image. (But if you've made changes - * then you *must* sync). See also: guestfs_umount and - * guestfs_umount_all calls. + /* This is only needed for libguestfs < 1.5.24. Since then + * it is done automatically when you close the handle. See + * discussion of autosync in this page. */ guestfs_sync (g); @@ -114,7 +115,8 @@ functions that return integers return C<-1> on error, and all functions that return pointers return C on error. See section L below for how to handle errors, and consult the documentation for each function call below to see precisely how they -return error indications. +return error indications. See L for fully worked +examples. =head2 DISK IMAGES @@ -160,27 +162,33 @@ NAMING> below. Before you can read or write files, create directories and so on in a disk image that contains filesystems, you have to mount those -filesystems using L. If you already know that a disk -image contains (for example) one partition with a filesystem on that -partition, then you can mount it directly: +filesystems using L or L. +If you already know that a disk image contains (for example) one +partition with a filesystem on that partition, then you can mount it +directly: - guestfs_mount (g, "/dev/sda1", "/"); + guestfs_mount_options (g, "", "/dev/sda1", "/"); where C means literally the first partition (C<1>) of the first disk image that we added (C). If the disk contains -Linux LVM2 logical volumes you could refer to those instead (eg. C). +Linux LVM2 logical volumes you could refer to those instead +(eg. C). Note that these are libguestfs virtual devices, +and are nothing to do with host devices. If you are given a disk image and you don't know what it contains then you have to find out. Libguestfs can do that too: use L and L to list possible partitions and LVs, and either try mounting each to see what is mountable, or else examine them with L or -L. Libguestfs also has a set of APIs for inspection of -disk images (see L below). But you might find it easier -to look at higher level programs built on top of libguestfs, in +L. To list just filesystems, use +L. + +Libguestfs also has a set of APIs for inspection of unknown disk +images (see L below). But you might find it easier to +look at higher level programs built on top of libguestfs, in particular L. -To mount a disk image read-only, use L. There are +To mount a filesystem read-only, use L. There are several other variations of the C call. =head2 FILESYSTEM ACCESS AND MODIFICATION @@ -254,10 +262,9 @@ L. =head2 DOWNLOADING -Use L to download small, text only files. This call -is limited to files which are less than 2 MB and which cannot contain -any ASCII NUL (C<\0>) characters. However it has a very simple -to use API. +Use L to download small, text only files. This call is +limited to files which are less than 2 MB and which cannot contain any +ASCII NUL (C<\0>) characters. However the API is very simple to use. L can be used to read files which contain arbitrary 8 bit data, since it returns a (pointer, size) pair. @@ -332,6 +339,27 @@ Use L. See L above. =back +=head2 UPLOADING AND DOWNLOADING TO PIPES AND FILE DESCRIPTORS + +Calls like L, L, +L, L etc appear to only take +filenames as arguments, so it appears you can only upload and download +to files. However many Un*x-like hosts let you use the special device +files C, C, C and C +to read and write from stdin, stdout, stderr, and arbitrary file +descriptor N. + +For example, L writes its output to stdout by +doing: + + guestfs_download (g, filename, "/dev/stdout"); + +and you can write tar output to a pipe C by doing: + + char devfd[64]; + snprintf (devfd, sizeof devfd, "/dev/fd/%d", fd); + guestfs_tar_out (g, "/", devfd); + =head2 LISTING FILES L is just designed for humans to read (mainly when using @@ -404,7 +432,8 @@ to their advantage. A secure alternative is to use libguestfs to install a "firstboot" script (a script which runs when the guest next boots normally), and to have this script run the commands you want in the normal context of -the running guest, network security and so on. +the running guest, network security and so on. For information about +other security issues, see L. =back @@ -575,13 +604,17 @@ inspection and caches the results in the guest handle. Subsequent calls to C return this cached information, but I re-read the disks. If you change the content of the guest disks, you can redo inspection by calling L -again. +again. (L works a little +differently from the other calls and does read the disks. See +documentation for that function for details). =head2 SPECIAL CONSIDERATIONS FOR WINDOWS GUESTS Libguestfs can mount NTFS partitions. It does this using the L driver. +=head3 DRIVE LETTERS AND PATHS + DOS and Windows still use drive letters, and the filesystems are always treated as case insensitive by Windows itself, and therefore you might find a Windows configuration file referring to a path like @@ -599,6 +632,8 @@ outside the scope of libguestfs, but something that you can easily do. Where we can help is in resolving the case insensitivity of paths. For this, call L. +=head3 ACCESSING THE WINDOWS REGISTRY + Libguestfs also provides some help for decoding Windows Registry "hive" files, through the library C which is part of the libguestfs project although ships as a separate tarball. You have to @@ -607,15 +642,42 @@ C functions. See also the programs L, L, L and L for more help on this issue. +=head3 SYMLINKS ON NTFS-3G FILESYSTEMS + +Ntfs-3g tries to rewrite "Junction Points" and NTFS "symbolic links" +to provide something which looks like a Linux symlink. The way it +tries to do the rewriting is described here: + +L + +The essential problem is that ntfs-3g simply does not have enough +information to do a correct job. NTFS links can contain drive letters +and references to external device GUIDs that ntfs-3g has no way of +resolving. It is almost certainly the case that libguestfs callers +should ignore what ntfs-3g does (ie. don't use L on +NTFS volumes). + +Instead if you encounter a symbolic link on an ntfs-3g filesystem, use +L to read the C extended +attribute, and read the raw reparse data from that (you can find the +format documented in various places around the web). + +=head3 EXTENDED ATTRIBUTES ON NTFS-3G FILESYSTEMS + +There are other useful extended attributes that can be read from +ntfs-3g filesystems (using L). See: + +L + =head2 USING LIBGUESTFS WITH OTHER PROGRAMMING LANGUAGES Although we don't want to discourage you from using the C API, we will mention here that the same API is also available in other languages. The API is broadly identical in all supported languages. This means -that the C call C is -C<$g-Emount($path)> in Perl, C in Python, -and C in OCaml. In other words, a +that the C call C is +C<$g-Eadd_drive_ro($file)> in Perl, C in Python, +and C in OCaml. In other words, a straightforward, predictable isomorphism between each language. Error messages are automatically transformed @@ -651,11 +713,11 @@ with libguestfs. =item B -For documentation see the file C. +See L. =item B -For documentation see L. +See L. =item B @@ -666,20 +728,15 @@ The PHP binding only works correctly on 64 bit machines. =item B -For documentation do: - - $ python - >>> import guestfs - >>> help (guestfs) +See L. =item B -Use the Guestfs module. There is no Ruby-specific documentation, but -you can find examples written in Ruby in the libguestfs source. +See L. =item B -For documentation see L. +See L. =back @@ -855,8 +912,8 @@ architecture for multithreaded programs using libvirt and libguestfs. =head2 PATH -Libguestfs needs a kernel and initrd.img, which it finds by looking -along an internal path. +Libguestfs needs a supermin appliance, which it finds by looking along +an internal path. By default it looks for these in the directory C<$libdir/guestfs> (eg. C or C). @@ -1006,6 +1063,166 @@ UUIDs and filesystem labels. =back +=head1 SECURITY + +This section discusses security implications of using libguestfs, +particularly with untrusted or malicious guests or disk images. + +=head2 GENERAL SECURITY CONSIDERATIONS + +Be careful with any files or data that you download from a guest (by +"download" we mean not just the L command but any +command that reads files, filenames, directories or anything else from +a disk image). An attacker could manipulate the data to fool your +program into doing the wrong thing. Consider cases such as: + +=over 4 + +=item * + +the data (file etc) not being present + +=item * + +being present but empty + +=item * + +being much larger than normal + +=item * + +containing arbitrary 8 bit data + +=item * + +being in an unexpected character encoding + +=item * + +containing homoglyphs. + +=back + +=head2 SECURITY OF MOUNTING FILESYSTEMS + +When you mount a filesystem under Linux, mistakes in the kernel +filesystem (VFS) module can sometimes be escalated into exploits by +deliberately creating a malicious, malformed filesystem. These +exploits are very severe for two reasons. Firstly there are very many +filesystem drivers in the kernel, and many of them are infrequently +used and not much developer attention has been paid to the code. +Linux userspace helps potential crackers by detecting the filesystem +type and automatically choosing the right VFS driver, even if that +filesystem type is obscure or unexpected for the administrator. +Secondly, a kernel-level exploit is like a local root exploit (worse +in some ways), giving immediate and total access to the system right +down to the hardware level. + +That explains why you should never mount a filesystem from an +untrusted guest on your host kernel. How about libguestfs? We run a +Linux kernel inside a qemu virtual machine, usually running as a +non-root user. The attacker would need to write a filesystem which +first exploited the kernel, and then exploited either qemu +virtualization (eg. a faulty qemu driver) or the libguestfs protocol, +and finally to be as serious as the host kernel exploit it would need +to escalate its privileges to root. This multi-step escalation, +performed by a static piece of data, is thought to be extremely hard +to do, although we never say 'never' about security issues. + +In any case callers can reduce the attack surface by forcing the +filesystem type when mounting (use L). + +=head2 PROTOCOL SECURITY + +The protocol is designed to be secure, being based on RFC 4506 (XDR) +with a defined upper message size. However a program that uses +libguestfs must also take care - for example you can write a program +that downloads a binary from a disk image and executes it locally, and +no amount of protocol security will save you from the consequences. + +=head2 INSPECTION SECURITY + +Parts of the inspection API (see L) return untrusted +strings directly from the guest, and these could contain any 8 bit +data. Callers should be careful to escape these before printing them +to a structured file (for example, use HTML escaping if creating a web +page). + +Guest configuration may be altered in unusual ways by the +administrator of the virtual machine, and may not reflect reality +(particularly for untrusted or actively malicious guests). For +example we parse the hostname from configuration files like +C that we find in the guest, but the guest +administrator can easily manipulate these files to provide the wrong +hostname. + +The inspection API parses guest configuration using two external +libraries: Augeas (Linux configuration) and hivex (Windows Registry). +Both are designed to be robust in the face of malicious data, although +denial of service attacks are still possible, for example with +oversized configuration files. + +=head2 RUNNING UNTRUSTED GUEST COMMANDS + +Be very cautious about running commands from the guest. By running a +command in the guest, you are giving CPU time to a binary that you do +not control, under the same user account as the library, albeit +wrapped in qemu virtualization. More information and alternatives can +be found in the section L. + +=head2 CVE-2010-3851 + +https://bugzilla.redhat.com/642934 + +This security bug concerns the automatic disk format detection that +qemu does on disk images. + +A raw disk image is just the raw bytes, there is no header. Other +disk images like qcow2 contain a special header. Qemu deals with this +by looking for one of the known headers, and if none is found then +assuming the disk image must be raw. + +This allows a guest which has been given a raw disk image to write +some other header. At next boot (or when the disk image is accessed +by libguestfs) qemu would do autodetection and think the disk image +format was, say, qcow2 based on the header written by the guest. + +This in itself would not be a problem, but qcow2 offers many features, +one of which is to allow a disk image to refer to another image +(called the "backing disk"). It does this by placing the path to the +backing disk into the qcow2 header. This path is not validated and +could point to any host file (eg. "/etc/passwd"). The backing disk is +then exposed through "holes" in the qcow2 disk image, which of course +is completely under the control of the attacker. + +In libguestfs this is rather hard to exploit except under two +circumstances: + +=over 4 + +=item 1. + +You have enabled the network or have opened the disk in write mode. + +=item 2. + +You are also running untrusted code from the guest (see +L). + +=back + +The way to avoid this is to specify the expected disk format when +adding disks (the optional C option to +L). You should always do this if the disk is +raw format, and it's a good idea for other cases too. + +For disks added from libvirt using calls like L, +the format is fetched from libvirt and passed through. + +For libguestfs tools, use the I<--format> command line parameter as +appropriate. + =head1 CONNECTION MANAGEMENT =head2 guestfs_h * @@ -1494,7 +1711,8 @@ indicator which shows the ratio of C:C. =item * If any progress notification is sent during a call, then a final -progress notification is always sent when C = C. +progress notification is always sent when C = C +(I the call fails with an error). This is to simplify caller code, so callers can easily set the progress indicator to "100%" at the end of the operation, without @@ -1694,6 +1912,14 @@ The header contains the procedure number (C) which is how the receiver knows what type of args structure to expect, or none at all. +For functions that take optional arguments, the optional arguments are +encoded in the C_args> structure in the same way as +ordinary arguments. A bitmask in the header indicates which optional +arguments are meaningful. The bitmask is also checked to see if it +contains bits set which the daemon does not know about (eg. if more +optional arguments were added in a later version of the library), and +this causes the call to be rejected. + The reply message for ordinary functions is: total length (header + ret, @@ -1868,6 +2094,353 @@ dot-oh release won't necessarily be so stable at this point, but by backporting fixes from development, that branch will stabilize over time. +=head1 EXTENDING LIBGUESTFS + +=head2 ADDING A NEW API ACTION + +Large amounts of boilerplate code in libguestfs (RPC, bindings, +documentation) are generated, and this makes it easy to extend the +libguestfs API. + +To add a new API action there are two changes: + +=over 4 + +=item 1. + +You need to add a description of the call (name, parameters, return +type, tests, documentation) to C. + +There are two sorts of API action, depending on whether the call goes +through to the daemon in the appliance, or is serviced entirely by the +library (see L above). L is an example +of the former, since the sync is done in the appliance. +L is an example of the latter, since a trace flag +is maintained in the handle and all tracing is done on the library +side. + +Most new actions are of the first type, and get added to the +C list. Each function has a unique procedure number +used in the RPC protocol which is assigned to that action when we +publish libguestfs and cannot be reused. Take the latest procedure +number and increment it. + +For library-only actions of the second type, add to the +C list. Since these functions are serviced by +the library and do not travel over the RPC mechanism to the daemon, +these functions do not need a procedure number, and so the procedure +number is set to C<-1>. + +=item 2. + +Implement the action (in C): + +For daemon actions, implement the function CnameE> in the +C directory. + +For library actions, implement the function CnameE> +(note: double underscore) in the C directory. + +In either case, use another function as an example of what to do. + +=back + +After making these changes, use C to compile. + +Note that you don't need to implement the RPC, language bindings, +manual pages or anything else. It's all automatically generated from +the OCaml description. + +=head2 ADDING TESTS FOR AN API ACTION + +You can supply zero or as many tests as you want per API call. The +tests can either be added as part of the API description +(C), or in some rarer cases you may +want to drop a script into C. Note that adding a script +to C is slower, so if possible use the first method. + +The following describes the test environment used when you add an API +test in C. + +The test environment has 4 block devices: + +=over 4 + +=item C 500MB + +General block device for testing. + +=item C 50MB + +C is an ext2 filesystem used for testing +filesystem write operations. + +=item C 10MB + +Used in a few tests where two block devices are needed. + +=item C + +ISO with fixed content (see C). + +=back + +To be able to run the tests in a reasonable amount of time, the +libguestfs appliance and block devices are reused between tests. So +don't try testing L :-x + +Each test starts with an initial scenario, selected using one of the +C expressions, described in C. +These initialize the disks mentioned above in a particular way as +documented in C. You should not assume anything +about the previous contents of other disks that are not initialized. + +You can add a prerequisite clause to any individual test. This is a +run-time check, which, if it fails, causes the test to be skipped. +Useful if testing a command which might not work on all variations of +libguestfs builds. A test that has prerequisite of C means to +run unconditionally. + +In addition, packagers can skip individual tests by setting +environment variables before running C. + + SKIP_TEST__=1 + +eg: C skips test #3 of L. + +or: + + SKIP_TEST_=1 + +eg: C skips all L tests. + +Packagers can run only certain tests by setting for example: + + TEST_ONLY="vfs_type zerofree" + +See C for more details of how these environment +variables work. + +=head2 DEBUGGING NEW API ACTIONS + +Test new actions work before submitting them. + +You can use guestfish to try out new commands. + +Debugging the daemon is a problem because it runs inside a minimal +environment. However you can fprintf messages in the daemon to +stderr, and they will show up if you use C. + +=head2 FORMATTING CODE AND OTHER CONVENTIONS + +Our C source code generally adheres to some basic code-formatting +conventions. The existing code base is not totally consistent on this +front, but we do prefer that contributed code be formatted similarly. +In short, use spaces-not-TABs for indentation, use 2 spaces for each +indentation level, and other than that, follow the K&R style. + +If you use Emacs, add the following to one of one of your start-up files +(e.g., ~/.emacs), to help ensure that you get indentation right: + + ;;; In libguestfs, indent with spaces everywhere (not TABs). + ;;; Exceptions: Makefile and ChangeLog modes. + (add-hook 'find-file-hook + '(lambda () (if (and buffer-file-name + (string-match "/libguestfs\\>" + (buffer-file-name)) + (not (string-equal mode-name "Change Log")) + (not (string-equal mode-name "Makefile"))) + (setq indent-tabs-mode nil)))) + + ;;; When editing C sources in libguestfs, use this style. + (defun libguestfs-c-mode () + "C mode with adjusted defaults for use with libguestfs." + (interactive) + (c-set-style "K&R") + (setq c-indent-level 2) + (setq c-basic-offset 2)) + (add-hook 'c-mode-hook + '(lambda () (if (string-match "/libguestfs\\>" + (buffer-file-name)) + (libguestfs-c-mode)))) + +Enable warnings when compiling (and fix any problems this +finds): + + ./configure --enable-gcc-warnings + +Useful targets are: + + make syntax-check # checks the syntax of the C code + make check # runs the test suite + +=head2 DAEMON CUSTOM PRINTF FORMATTERS + +In the daemon code we have created custom printf formatters C<%Q> and +C<%R>, which are used to do shell quoting. + +=over 4 + +=item %Q + +Simple shell quoted string. Any spaces or other shell characters are +escaped for you. + +=item %R + +Same as C<%Q> except the string is treated as a path which is prefixed +by the sysroot. + +=back + +For example: + + asprintf (&cmd, "cat %R", path); + +would produce C + +I Do I use these when you are passing parameters to the +C functions. These parameters do NOT need to be +quoted because they are not passed via the shell (instead, straight to +exec). You probably want to use the C function +however. + +=head2 SUBMITTING YOUR NEW API ACTIONS + +Submit patches to the mailing list: +L +and CC to L. + +=head2 INTERNATIONALIZATION (I18N) SUPPORT + +We support i18n (gettext anyhow) in the library. + +However many messages come from the daemon, and we don't translate +those at the moment. One reason is that the appliance generally has +all locale files removed from it, because they take up a lot of space. +So we'd have to readd some of those, as well as copying our PO files +into the appliance. + +Debugging messages are never translated, since they are intended for +the programmers. + +=head2 SOURCE CODE SUBDIRECTORIES + +=over 4 + +=item C + +The libguestfs appliance, build scripts and so on. + +=item C + +Automated tests of the C API. + +=item C + +The L, L and L commands +and documentation. + +=item C + +Safety and liveness tests of components that libguestfs depends upon +(not of libguestfs itself). Mainly this is for qemu and the kernel. + +=item C + +Outside contributions, experimental parts. + +=item C + +The daemon that runs inside the libguestfs appliance and carries out +actions. + +=item C + +L command and documentation. + +=item C + +C API example code. + +=item C + +L, the command-line shell. + +=item C + +L, FUSE (userspace filesystem) built on top of libguestfs. + +=item C + +The crucially important generator, used to automatically generate +large amounts of boilerplate C code for things like RPC and bindings. + +=item C + +Files used by the test suite. + +Some "phony" guest images which we test against. + +=item C + +L, the virtual machine image inspector. + +=item C + +M4 macros used by autoconf. + +=item C + +Translations of simple gettext strings. + +=item C + +The build infrastructure and PO files for translations of manpages and +POD files. Eventually this will be combined with the C directory, +but that is rather complicated. + +=item C + +Regression tests. + +=item C + +L command and documentation. + +=item C + +Source code to the C library. + +=item C + +Command line tools written in Perl (L and many others). + +=item C + +Test tool for end users to test if their qemu/kernel combination +will work with libguestfs. + +=item C + +=item C + +=item C + +=item C + +=item C + +=item C + +=item C + +=item C + +Language bindings. + +=back + =head1 ENVIRONMENT VARIABLES =over 4 @@ -1890,8 +2463,8 @@ example: =item LIBGUESTFS_PATH -Set the path that libguestfs uses to search for kernel and initrd.img. -See the discussion of paths in section PATH above. +Set the path that libguestfs uses to search for a supermin appliance. +See the discussion of paths in section L above. =item LIBGUESTFS_QEMU @@ -1920,11 +2493,16 @@ enough. =head1 SEE ALSO +L, +L, +L, +L, L, L, L, L, L, +L, L, L, L,