Libguestfs is a library that can be linked with C and C++ management
programs (or management programs written in OCaml, Perl, Python, Ruby,
-Java, Haskell or C#). You can also use it from shell scripts or the
+Java, PHP, Haskell or C#). You can also use it from shell scripts or the
command line.
You don't need to be root to use libguestfs, although obviously you do
you have created through L<posix_fallocate(3)>. Libguestfs lets you
do useful things to all of these.
-You can add a disk read-only using L</guestfs_add_drive_ro>, in which
+The call you should use in modern code for adding drives is
+L</guestfs_add_drive_opts>. To add a disk image, allowing writes, and
+specifying that the format is raw, do:
+
+ guestfs_add_drive_opts (g, filename,
+ GUESTFS_ADD_DRIVE_OPTS_FORMAT, "raw",
+ -1);
+
+You can add a disk read-only using:
+
+ guestfs_add_drive_opts (g, filename,
+ GUESTFS_ADD_DRIVE_OPTS_FORMAT, "raw",
+ GUESTFS_ADD_DRIVE_OPTS_READONLY, 1,
+ -1);
+
+or by calling the older function L</guestfs_add_drive_ro>. In either
case libguestfs won't modify the file.
Be extremely cautious if the disk image is in use, eg. if it is being
=head2 RUNNING COMMANDS
-Although libguestfs is a primarily an API for manipulating files
+Although libguestfs is primarily an API for manipulating files
inside guest images, we also provide some limited facilities for
running commands inside guests.
=item *
+The network may not be available unless you enable it
+(see L</guestfs_set_network>).
+
+=item *
+
Only supports Linux guests (not Windows, BSD, etc).
=item *
For SELinux guests, you may need to enable SELinux and load policy
first. See L</SELINUX> in this manpage.
+=item *
+
+I<Security:> It is not safe to run commands from untrusted, possibly
+malicious guests. These commands may attempt to exploit your program
+by sending unexpected output. They could also try to exploit the
+Linux kernel or qemu provided by the libguestfs appliance. They could
+use the network provided by the libguestfs appliance to bypass
+ordinary network partitions and firewalls. They could use the
+elevated privileges or different SELinux context of your program
+to their advantage.
+
+A secure alternative is to use libguestfs to install a "firstboot"
+script (a script which runs when the guest next boots normally), and
+to have this script run the commands you want in the normal context of
+the running guest, network security and so on. For information about
+other security issues, see L</SECURITY>.
+
=back
The two main API calls to run commands are L</guestfs_command> and
calls to C<guestfs_inspect_get_*> return this cached information, but
I<do not> re-read the disks. If you change the content of the guest
disks, you can redo inspection by calling L</guestfs_inspect_os>
-again.
+again. (L</guestfs_inspect_list_applications> works a little
+differently from the other calls and does read the disks. See
+documentation for that function for details).
=head2 SPECIAL CONSIDERATIONS FOR WINDOWS GUESTS
For documentation see L<Sys::Guestfs(3)>.
+=item B<PHP>
+
+For documentation see C<README-PHP> supplied with libguestfs
+sources or in the php-libguestfs package for your distribution.
+
+The PHP binding only works correctly on 64 bit machines.
+
=item B<Python>
For documentation do:
dirty guestfish scripts that forget to sync will work just fine, which
can make this very puzzling if you are trying to debug a problem.
+Update: Autosync is enabled by default for all API users starting from
+libguestfs 1.5.24.
+
=item Mount option C<-o sync> should not be the default.
If you use L</guestfs_mount>, then C<-o sync,noatime> are added
This could be fixed in the generator by specially marking parameters
and return values which take bytes or other units.
-=item Protocol should return errno with error messages.
+=item Ambiguity between devices and paths
+
+There is a subtle ambiguity in the API between a device name
+(eg. C</dev/sdb2>) and a similar pathname. A file might just happen
+to be called C<sdb2> in the directory C</dev> (consider some non-Unix
+VM image).
+
+In the current API we usually resolve this ambiguity by having two
+separate calls, for example L</guestfs_checksum> and
+L</guestfs_checksum_device>. Some API calls are ambiguous and
+(incorrectly) resolve the problem by detecting if the path supplied
+begins with C</dev/>.
-It would be a nice-to-have to be able to get the original value of
-'errno' from inside the appliance along error paths (where set).
-Currently L<guestmount(1)> goes through hoops to try to reverse the
-error message string into an errno, see the function error() in
-fuse/guestmount.c.
+To avoid both the ambiguity and the need to duplicate some calls, we
+could make paths/devices into structured names. One way to do this
+would be to use a notation like grub (C<hd(0,0)>), although nobody
+really likes this aspect of grub. Another way would be to use a
+structured type, equivalent to this OCaml type:
+
+ type path = Path of string | Device of int | Partition of int * int
+
+which would allow you to pass arguments like:
+
+ Path "/foo/bar"
+ Device 1 (* /dev/sdb, or perhaps /dev/sda *)
+ Partition (1, 2) (* /dev/sdb2 (or is it /dev/sda2 or /dev/sdb3?) *)
+ Path "/dev/sdb2" (* not a device *)
+
+As you can see there are still problems to resolve even with this
+representation. Also consider how it might work in guestfish.
=back
this is a concern, scrub the swap partition or don't use libguestfs on
encrypted devices.
+=head2 MULTIPLE HANDLES AND MULTIPLE THREADS
+
+All high-level libguestfs actions are synchronous. If you want
+to use libguestfs asynchronously then you must create a thread.
+
+Only use the handle from a single thread. Either use the handle
+exclusively from one thread, or provide your own mutex so that two
+threads cannot issue calls on the same handle at the same time.
+
+See the graphical program guestfs-browser for one possible
+architecture for multithreaded programs using libvirt and libguestfs.
+
+=head2 PATH
+
+Libguestfs needs a kernel and initrd.img, which it finds by looking
+along an internal path.
+
+By default it looks for these in the directory C<$libdir/guestfs>
+(eg. C</usr/local/lib/guestfs> or C</usr/lib64/guestfs>).
+
+Use L</guestfs_set_path> or set the environment variable
+L</LIBGUESTFS_PATH> to change the directories that libguestfs will
+search in. The value is a colon-separated list of paths. The current
+directory is I<not> searched unless the path contains an empty element
+or C<.>. For example C<LIBGUESTFS_PATH=:/usr/lib/guestfs> would
+search the current directory and then C</usr/lib/guestfs>.
+
+=head2 QEMU WRAPPERS
+
+If you want to compile your own qemu, run qemu from a non-standard
+location, or pass extra arguments to qemu, then you can write a
+shell-script wrapper around qemu.
+
+There is one important rule to remember: you I<must C<exec qemu>> as
+the last command in the shell script (so that qemu replaces the shell
+and becomes the direct child of the libguestfs-using program). If you
+don't do this, then the qemu process won't be cleaned up correctly.
+
+Here is an example of a wrapper, where I have built my own copy of
+qemu from source:
+
+ #!/bin/sh -
+ qemudir=/home/rjones/d/qemu
+ exec $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios "$@"
+
+Save this script as C</tmp/qemu.wrapper> (or wherever), C<chmod +x>,
+and then use it by setting the LIBGUESTFS_QEMU environment variable.
+For example:
+
+ LIBGUESTFS_QEMU=/tmp/qemu.wrapper guestfish
+
+Note that libguestfs also calls qemu with the -help and -version
+options in order to determine features.
+
+=head2 ABI GUARANTEE
+
+We guarantee the libguestfs ABI (binary interface), for public,
+high-level actions as outlined in this section. Although we will
+deprecate some actions, for example if they get replaced by newer
+calls, we will keep the old actions forever. This allows you the
+developer to program in confidence against the libguestfs API.
+
+=head2 BLOCK DEVICE NAMING
+
+In the kernel there is now quite a profusion of schemata for naming
+block devices (in this context, by I<block device> I mean a physical
+or virtual hard drive). The original Linux IDE driver used names
+starting with C</dev/hd*>. SCSI devices have historically used a
+different naming scheme, C</dev/sd*>. When the Linux kernel I<libata>
+driver became a popular replacement for the old IDE driver
+(particularly for SATA devices) those devices also used the
+C</dev/sd*> scheme. Additionally we now have virtual machines with
+paravirtualized drivers. This has created several different naming
+systems, such as C</dev/vd*> for virtio disks and C</dev/xvd*> for Xen
+PV disks.
+
+As discussed above, libguestfs uses a qemu appliance running an
+embedded Linux kernel to access block devices. We can run a variety
+of appliances based on a variety of Linux kernels.
+
+This causes a problem for libguestfs because many API calls use device
+or partition names. Working scripts and the recipe (example) scripts
+that we make available over the internet could fail if the naming
+scheme changes.
+
+Therefore libguestfs defines C</dev/sd*> as the I<standard naming
+scheme>. Internally C</dev/sd*> names are translated, if necessary,
+to other names as required. For example, under RHEL 5 which uses the
+C</dev/hd*> scheme, any device parameter C</dev/sda2> is translated to
+C</dev/hda2> transparently.
+
+Note that this I<only> applies to parameters. The
+L</guestfs_list_devices>, L</guestfs_list_partitions> and similar calls
+return the true names of the devices and partitions as known to the
+appliance.
+
+=head3 ALGORITHM FOR BLOCK DEVICE NAME TRANSLATION
+
+Usually this translation is transparent. However in some (very rare)
+cases you may need to know the exact algorithm. Such cases include
+where you use L</guestfs_config> to add a mixture of virtio and IDE
+devices to the qemu-based appliance, so have a mixture of C</dev/sd*>
+and C</dev/vd*> devices.
+
+The algorithm is applied only to I<parameters> which are known to be
+either device or partition names. Return values from functions such
+as L</guestfs_list_devices> are never changed.
+
+=over 4
+
+=item *
+
+Is the string a parameter which is a device or partition name?
+
+=item *
+
+Does the string begin with C</dev/sd>?
+
+=item *
+
+Does the named device exist? If so, we use that device.
+However if I<not> then we continue with this algorithm.
+
+=item *
+
+Replace initial C</dev/sd> string with C</dev/hd>.
+
+For example, change C</dev/sda2> to C</dev/hda2>.
+
+If that named device exists, use it. If not, continue.
+
+=item *
+
+Replace initial C</dev/sd> string with C</dev/vd>.
+
+If that named device exists, use it. If not, return an error.
+
+=back
+
+=head3 PORTABILITY CONCERNS WITH BLOCK DEVICE NAMING
+
+Although the standard naming scheme and automatic translation is
+useful for simple programs and guestfish scripts, for larger programs
+it is best not to rely on this mechanism.
+
+Where possible for maximum future portability programs using
+libguestfs should use these future-proof techniques:
+
+=over 4
+
+=item *
+
+Use L</guestfs_list_devices> or L</guestfs_list_partitions> to list
+actual device names, and then use those names directly.
+
+Since those device names exist by definition, they will never be
+translated.
+
+=item *
+
+Use higher level ways to identify filesystems, such as LVM names,
+UUIDs and filesystem labels.
+
+=back
+
+=head1 SECURITY
+
+This section discusses security implications of using libguestfs,
+particularly with untrusted or malicious guests or disk images.
+
+=head2 GENERAL SECURITY CONSIDERATIONS
+
+Be careful with any files or data that you download from a guest (by
+"download" we mean not just the L</guestfs_download> command but any
+command that reads files, filenames, directories or anything else from
+a disk image). An attacker could manipulate the data to fool your
+program into doing the wrong thing. Consider cases such as:
+
+=over 4
+
+=item *
+
+the data (file etc) not being present
+
+=item *
+
+being present but empty
+
+=item *
+
+being much larger than normal
+
+=item *
+
+containing arbitrary 8 bit data
+
+=item *
+
+being in an unexpected character encoding
+
+=item *
+
+containing homoglyphs.
+
+=back
+
+=head2 SECURITY OF MOUNTING FILESYSTEMS
+
+When you mount a filesystem under Linux, mistakes in the kernel
+filesystem (VFS) module can sometimes be escalated into exploits by
+deliberately creating a malicious, malformed filesystem. These
+exploits are very severe for two reasons. Firstly there are very many
+filesystem drivers in the kernel, and many of them are infrequently
+used and not much developer attention has been paid to the code.
+Linux userspace helps potential crackers by detecting the filesystem
+type and automatically choosing the right VFS driver, even if that
+filesystem type is obscure or unexpected for the administrator.
+Secondly, a kernel-level exploit is like a local root exploit (worse
+in some ways), giving immediate and total access to the system right
+down to the hardware level.
+
+That explains why you should never mount a filesystem from an
+untrusted guest on your host kernel. How about libguestfs? We run a
+Linux kernel inside a qemu virtual machine, usually running as a
+non-root user. The attacker would need to write a filesystem which
+first exploited the kernel, and then exploited either qemu
+virtualization (eg. a faulty qemu driver) or the libguestfs protocol,
+and finally to be as serious as the host kernel exploit it would need
+to escalate its privileges to root. This multi-step escalation,
+performed by a static piece of data, is thought to be extremely hard
+to do, although we never say 'never' about security issues.
+
+In any case callers can reduce the attack surface by forcing the
+filesystem type when mounting (use L</guestfs_mount_vfs>).
+
+=head2 PROTOCOL SECURITY
+
+The protocol is designed to be secure, being based on RFC 4506 (XDR)
+with a defined upper message size. However a program that uses
+libguestfs must also take care - for example you can write a program
+that downloads a binary from a disk image and executes it locally, and
+no amount of protocol security will save you from the consequences.
+
+=head2 INSPECTION SECURITY
+
+Parts of the inspection API (see L</INSPECTION>) return untrusted
+strings directly from the guest, and these could contain any 8 bit
+data. Callers should be careful to escape these before printing them
+to a structured file (for example, use HTML escaping if creating a web
+page).
+
+Guest configuration may be altered in unusual ways by the
+administrator of the virtual machine, and may not reflect reality
+(particularly for untrusted or actively malicious guests). For
+example we parse the hostname from configuration files like
+C</etc/sysconfig/network> that we find in the guest, but the guest
+administrator can easily manipulate these files to provide the wrong
+hostname.
+
+The inspection API parses guest configuration using two external
+libraries: Augeas (Linux configuration) and hivex (Windows Registry).
+Both are designed to be robust in the face of malicious data, although
+denial of service attacks are still possible, for example with
+oversized configuration files.
+
+=head2 RUNNING UNTRUSTED GUEST COMMANDS
+
+Be very cautious about running commands from the guest. By running a
+command in the guest, you are giving CPU time to a binary that you do
+not control, under the same user account as the library, albeit
+wrapped in qemu virtualization. More information and alternatives can
+be found in the section L</RUNNING COMMANDS>.
+
+=head2 CVE-2010-3851
+
+https://bugzilla.redhat.com/642934
+
+This security bug concerns the automatic disk format detection that
+qemu does on disk images.
+
+A raw disk image is just the raw bytes, there is no header. Other
+disk images like qcow2 contain a special header. Qemu deals with this
+by looking for one of the known headers, and if none is found then
+assuming the disk image must be raw.
+
+This allows a guest which has been given a raw disk image to write
+some other header. At next boot (or when the disk image is accessed
+by libguestfs) qemu would do autodetection and think the disk image
+format was, say, qcow2 based on the header written by the guest.
+
+This in itself would not be a problem, but qcow2 offers many features,
+one of which is to allow a disk image to refer to another image
+(called the "backing disk"). It does this by placing the path to the
+backing disk into the qcow2 header. This path is not validated and
+could point to any host file (eg. "/etc/passwd"). The backing disk is
+then exposed through "holes" in the qcow2 disk image, which of course
+is completely under the control of the attacker.
+
+In libguestfs this is rather hard to exploit except under two
+circumstances:
+
+=over 4
+
+=item 1.
+
+You have enabled the network or have opened the disk in write mode.
+
+=item 2.
+
+You are also running untrusted code from the guest (see
+L</RUNNING COMMANDS>).
+
+=back
+
+The way to avoid this is to specify the expected disk format when
+adding disks (the optional C<format> option to
+L</guestfs_add_drive_opts>). You should always do this if the disk is
+raw format, and it's a good idea for other cases too.
+
+For disks added from libvirt using calls like L</guestfs_add_domain>,
+the format is fetched from libvirt and passed through.
+
+For libguestfs tools, use the I<--format> command line parameter as
+appropriate.
+
=head1 CONNECTION MANAGEMENT
=head2 guestfs_h *
Create a connection handle.
-You have to call L</guestfs_add_drive> on the handle at least once.
+You have to call L</guestfs_add_drive_opts> (or one of the equivalent
+calls) on the handle at least once.
This function returns a non-NULL pointer to a handle on success or
NULL on error.
=head1 ERROR HANDLING
-The convention in all functions that return C<int> is that they return
-C<-1> to indicate an error. You can get additional information on
-errors by calling L</guestfs_last_error> and/or by setting up an error
-handler with L</guestfs_set_error_handler>.
+API functions can return errors. For example, almost all functions
+that return C<int> will return C<-1> to indicate an error.
+
+Additional information is available for errors: an error message
+string and optionally an error number (errno) if the thing that failed
+was a system call.
+
+You can get at the additional information about the last error on the
+handle by calling L</guestfs_last_error>, L</guestfs_last_errno>,
+and/or by setting up an error handler with
+L</guestfs_set_error_handler>.
+
+When the handle is created, a default error handler is installed which
+prints the error message string to C<stderr>. For small short-running
+command line programs it is sufficient to do:
-The default error handler prints the information string to C<stderr>.
+ if (guestfs_launch (g) == -1)
+ exit (EXIT_FAILURE);
+
+since the default error handler will ensure that an error message has
+been printed to C<stderr> before the program exits.
+
+For other programs the caller will almost certainly want to install an
+alternate error handler or do error handling in-line like this:
+
+ g = guestfs_create ();
+
+ /* This disables the default behaviour of printing errors
+ on stderr. */
+ guestfs_set_error_handler (g, NULL, NULL);
+
+ if (guestfs_launch (g) == -1) {
+ /* Examine the error message and print it etc. */
+ char *msg = guestfs_last_error (g);
+ int errnum = guestfs_last_errno (g);
+ fprintf (stderr, "%s\n", msg);
+ /* ... */
+ }
Out of memory errors are handled differently. The default action is
to call L<abort(3)>. If this is undesirable, then you can set a
handler using L</guestfs_set_out_of_memory_handler>.
+L</guestfs_create> returns C<NULL> if the handle cannot be created,
+and because there is no handle if this happens there is no way to get
+additional error information. However L</guestfs_create> is supposed
+to be a lightweight operation which can only fail because of
+insufficient memory (it returns NULL in this case).
+
=head2 guestfs_last_error
const char *guestfs_last_error (guestfs_h *g);
The lifetime of the returned string is until the next error occurs, or
L</guestfs_close> is called.
-The error string is not localized (ie. is always in English), because
-this makes searching for error messages in search engines give the
-largest number of results.
+=head2 guestfs_last_errno
+
+ int guestfs_last_errno (guestfs_h *g);
+
+This returns the last error number (errno) that happened on C<g>.
+
+If successful, an errno integer not equal to zero is returned.
+
+If no error, this returns 0. This call can return 0 in three
+situations:
+
+=over 4
+
+=item 1.
+
+There has not been any error on the handle.
+
+=item 2.
+
+There has been an error but the errno was meaningless. This
+corresponds to the case where the error did not come from a
+failed system call, but for some other reason.
+
+=item 3.
+
+There was an error from a failed system call, but for some
+reason the errno was not captured and returned. This usually
+indicates a bug in libguestfs.
+
+=back
+
+Libguestfs tries to convert the errno from inside the applicance into
+a corresponding errno for the caller (not entirely trivial: the
+appliance might be running a completely different operating system
+from the library and error numbers are not standardized across
+Un*xen). If this could not be done, then the error is translated to
+C<EINVAL>. In practice this should only happen in very rare
+circumstances.
=head2 guestfs_set_error_handler
typedef void (*guestfs_error_handler_cb) (guestfs_h *g,
- void *data,
+ void *opaque,
const char *msg);
void guestfs_set_error_handler (guestfs_h *g,
guestfs_error_handler_cb cb,
- void *data);
+ void *opaque);
The callback C<cb> will be called if there is an error. The
parameters passed to the callback are an opaque data pointer and the
error message string.
+C<errno> is not passed to the callback. To get that the callback must
+call L</guestfs_last_errno>.
+
Note that the message string C<msg> is freed as soon as the callback
function returns, so if you want to stash it somewhere you must make
your own copy.
=head2 guestfs_get_error_handler
guestfs_error_handler_cb guestfs_get_error_handler (guestfs_h *g,
- void **data_rtn);
+ void **opaque_rtn);
Returns the current error handler callback.
This returns the current out of memory handler.
-=head1 PATH
+=head1 API CALLS
-Libguestfs needs a kernel and initrd.img, which it finds by looking
-along an internal path.
+@ACTIONS@
-By default it looks for these in the directory C<$libdir/guestfs>
-(eg. C</usr/local/lib/guestfs> or C</usr/lib64/guestfs>).
-
-Use L</guestfs_set_path> or set the environment variable
-L</LIBGUESTFS_PATH> to change the directories that libguestfs will
-search in. The value is a colon-separated list of paths. The current
-directory is I<not> searched unless the path contains an empty element
-or C<.>. For example C<LIBGUESTFS_PATH=:/usr/lib/guestfs> would
-search the current directory and then C</usr/lib/guestfs>.
-
-=head1 HIGH-LEVEL API ACTIONS
-
-=head2 ABI GUARANTEE
-
-We guarantee the libguestfs ABI (binary interface), for public,
-high-level actions as outlined in this section. Although we will
-deprecate some actions, for example if they get replaced by newer
-calls, we will keep the old actions forever. This allows you the
-developer to program in confidence against the libguestfs API.
-
-@ACTIONS@
-
-=head1 STRUCTURES
+=head1 STRUCTURES
@STRUCTS@
=head2 SINGLE CALLS AT COMPILE TIME
-If you need to test whether a single libguestfs function is
-available at compile time, we recommend using build tools
-such as autoconf or cmake. For example in autotools you could
-use:
+Since version 1.5.8, C<E<lt>guestfs.hE<gt>> defines symbols
+for each C API function, such as:
+
+ #define LIBGUESTFS_HAVE_DD 1
+
+if L</guestfs_dd> is available.
+
+Before version 1.5.8, if you needed to test whether a single
+libguestfs function is available at compile time, we recommended using
+build tools such as autoconf or cmake. For example in autotools you
+could use:
AC_CHECK_LIB([guestfs],[guestfs_create])
AC_CHECK_FUNCS([guestfs_dd])
at run time, as in this example program (note that you still
need the compile time check as well):
- #include <config.h>
-
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
main ()
{
- #ifdef HAVE_GUESTFS_DD
+ #ifdef LIBGUESTFS_HAVE_DD
void *dl;
int has_function;
Requires: libguestfs >= 1.0.80
-=begin html
+=head1 CALLS WITH OPTIONAL ARGUMENTS
-<!-- old anchor for the next section -->
-<a name="state_machine_and_low_level_event_api"/>
+A recent feature of the API is the introduction of calls which take
+optional arguments. In C these are declared 3 ways. The main way is
+as a call which takes variable arguments (ie. C<...>), as in this
+example:
-=end html
+ int guestfs_add_drive_opts (guestfs_h *g, const char *filename, ...);
-=head1 ARCHITECTURE
+Call this with a list of optional arguments, terminated by C<-1>.
+So to call with no optional arguments specified:
-Internally, libguestfs is implemented by running an appliance (a
-special type of small virtual machine) using L<qemu(1)>. Qemu runs as
-a child process of the main program.
+ guestfs_add_drive_opts (g, filename, -1);
- ___________________
- / \
- | main program |
- | |
- | | child process / appliance
- | | __________________________
- | | / qemu \
- +-------------------+ RPC | +-----------------+ |
- | libguestfs <--------------------> guestfsd | |
- | | | +-----------------+ |
- \___________________/ | | Linux kernel | |
- | +--^--------------+ |
- \_________|________________/
- |
- _______v______
- / \
- | Device or |
- | disk image |
- \______________/
+With a single optional argument:
-The library, linked to the main program, creates the child process and
-hence the appliance in the L</guestfs_launch> function.
+ guestfs_add_drive_opts (g, filename,
+ GUESTFS_ADD_DRIVE_OPTS_FORMAT, "qcow2",
+ -1);
-Inside the appliance is a Linux kernel and a complete stack of
-userspace tools (such as LVM and ext2 programs) and a small
-controlling daemon called L</guestfsd>. The library talks to
-L</guestfsd> using remote procedure calls (RPC). There is a mostly
-one-to-one correspondence between libguestfs API calls and RPC calls
-to the daemon. Lastly the disk image(s) are attached to the qemu
-process which translates device access by the appliance's Linux kernel
-into accesses to the image.
+With two:
-A common misunderstanding is that the appliance "is" the virtual
-machine. Although the disk image you are attached to might also be
-used by some virtual machine, libguestfs doesn't know or care about
-this. (But you will care if both libguestfs's qemu process and your
-virtual machine are trying to update the disk image at the same time,
-since these usually results in massive disk corruption).
+ guestfs_add_drive_opts (g, filename,
+ GUESTFS_ADD_DRIVE_OPTS_FORMAT, "qcow2",
+ GUESTFS_ADD_DRIVE_OPTS_READONLY, 1,
+ -1);
-=head1 STATE MACHINE
+and so forth. Don't forget the terminating C<-1> otherwise
+Bad Things will happen!
-libguestfs uses a state machine to model the child process:
+=head2 USING va_list FOR OPTIONAL ARGUMENTS
- |
- guestfs_create
- |
- |
- ____V_____
- / \
- | CONFIG |
- \__________/
- ^ ^ ^ \
- / | \ \ guestfs_launch
- / | _\__V______
- / | / \
- / | | LAUNCHING |
- / | \___________/
- / | /
- / | guestfs_launch
- / | /
- ______ / __|____V
- / \ ------> / \
- | BUSY | | READY |
- \______/ <------ \________/
+The second variant has the same name with the suffix C<_va>, which
+works the same way but takes a C<va_list>. See the C manual for
+details. For the example function, this is declared:
-The normal transitions are (1) CONFIG (when the handle is created, but
-there is no child process), (2) LAUNCHING (when the child process is
-booting up), (3) alternating between READY and BUSY as commands are
-issued to, and carried out by, the child process.
+ int guestfs_add_drive_opts_va (guestfs_h *g, const char *filename,
+ va_list args);
-The guest may be killed by L</guestfs_kill_subprocess>, or may die
-asynchronously at any time (eg. due to some internal error), and that
-causes the state to transition back to CONFIG.
+=head2 CONSTRUCTING OPTIONAL ARGUMENTS
-Configuration commands for qemu such as L</guestfs_add_drive> can only
-be issued when in the CONFIG state.
+The third variant is useful where you need to construct these
+calls. You pass in a structure where you fill in the optional
+fields. The structure has a bitmask as the first element which
+you must set to indicate which fields you have filled in. For
+our example function the structure and call are declared:
-The high-level API offers two calls that go from CONFIG through
-LAUNCHING to READY. L</guestfs_launch> blocks until the child process
-is READY to accept commands (or until some failure or timeout).
-L</guestfs_launch> internally moves the state from CONFIG to LAUNCHING
-while it is running.
+ struct guestfs_add_drive_opts_argv {
+ uint64_t bitmask;
+ int readonly;
+ const char *format;
+ /* ... */
+ };
+ int guestfs_add_drive_opts_argv (guestfs_h *g, const char *filename,
+ const struct guestfs_add_drive_opts_argv *optargs);
-High-level API actions such as L</guestfs_mount> can only be issued
-when in the READY state. These high-level API calls block waiting for
-the command to be carried out (ie. the state to transition to BUSY and
-then back to READY). But using the low-level event API, you get
-non-blocking versions. (But you can still only carry out one
-operation per handle at a time - that is a limitation of the
-communications protocol we use).
+You could call it like this:
-Finally, the child process sends asynchronous messages back to the
-main program, such as kernel log messages. Mostly these are ignored
-by the high-level API, but using the low-level event API you can
-register to receive these messages.
+ struct guestfs_add_drive_opts_argv optargs = {
+ .bitmask = GUESTFS_ADD_DRIVE_OPTS_READONLY_BITMASK |
+ GUESTFS_ADD_DRIVE_OPTS_FORMAT_BITMASK,
+ .readonly = 1,
+ .format = "qcow2"
+ };
+
+ guestfs_add_drive_opts_argv (g, filename, &optargs);
+
+Notes:
+
+=over 4
+
+=item *
+
+The C<_BITMASK> suffix on each option name when specifying the
+bitmask.
+
+=item *
+
+You do not need to fill in all fields of the structure.
+
+=item *
+
+There must be a one-to-one correspondence between fields of the
+structure that are filled in, and bits set in the bitmask.
+
+=back
+
+=head2 OPTIONAL ARGUMENTS IN OTHER LANGUAGES
+
+In other languages, optional arguments are expressed in the
+way that is natural for that language. We refer you to the
+language-specific documentation for more details on that.
+
+For guestfish, see L<guestfish(1)/OPTIONAL ARGUMENTS>.
=head2 SETTING CALLBACKS TO HANDLE EVENTS
up by the time this is called, and if your callback then jumps
into some HLL function).
-=head1 BLOCK DEVICE NAMING
+=head2 guestfs_set_progress_callback
-In the kernel there is now quite a profusion of schemata for naming
-block devices (in this context, by I<block device> I mean a physical
-or virtual hard drive). The original Linux IDE driver used names
-starting with C</dev/hd*>. SCSI devices have historically used a
-different naming scheme, C</dev/sd*>. When the Linux kernel I<libata>
-driver became a popular replacement for the old IDE driver
-(particularly for SATA devices) those devices also used the
-C</dev/sd*> scheme. Additionally we now have virtual machines with
-paravirtualized drivers. This has created several different naming
-systems, such as C</dev/vd*> for virtio disks and C</dev/xvd*> for Xen
-PV disks.
+ typedef void (*guestfs_progress_cb) (guestfs_h *g, void *opaque,
+ int proc_nr, int serial,
+ uint64_t position, uint64_t total);
+ void guestfs_set_progress_callback (guestfs_h *g,
+ guestfs_progress_cb cb,
+ void *opaque);
-As discussed above, libguestfs uses a qemu appliance running an
-embedded Linux kernel to access block devices. We can run a variety
-of appliances based on a variety of Linux kernels.
+Some long-running operations can generate progress messages. If
+this callback is registered, then it will be called each time a
+progress message is generated (usually two seconds after the
+operation started, and three times per second thereafter until
+it completes, although the frequency may change in future versions).
-This causes a problem for libguestfs because many API calls use device
-or partition names. Working scripts and the recipe (example) scripts
-that we make available over the internet could fail if the naming
-scheme changes.
+The callback receives two numbers: C<position> and C<total>.
+The units of C<total> are not defined, although for some
+operations C<total> may relate in some way to the amount of
+data to be transferred (eg. in bytes or megabytes), and
+C<position> may be the portion which has been transferred.
-Therefore libguestfs defines C</dev/sd*> as the I<standard naming
-scheme>. Internally C</dev/sd*> names are translated, if necessary,
-to other names as required. For example, under RHEL 5 which uses the
-C</dev/hd*> scheme, any device parameter C</dev/sda2> is translated to
-C</dev/hda2> transparently.
+The only defined and stable parts of the API are:
-Note that this I<only> applies to parameters. The
-L</guestfs_list_devices>, L</guestfs_list_partitions> and similar calls
-return the true names of the devices and partitions as known to the
-appliance.
+=over 4
-=head2 ALGORITHM FOR BLOCK DEVICE NAME TRANSLATION
+=item *
-Usually this translation is transparent. However in some (very rare)
-cases you may need to know the exact algorithm. Such cases include
-where you use L</guestfs_config> to add a mixture of virtio and IDE
-devices to the qemu-based appliance, so have a mixture of C</dev/sd*>
-and C</dev/vd*> devices.
+The callback can display to the user some type of progress bar or
+indicator which shows the ratio of C<position>:C<total>.
-The algorithm is applied only to I<parameters> which are known to be
-either device or partition names. Return values from functions such
-as L</guestfs_list_devices> are never changed.
+=item *
-=over 4
+0 E<lt>= C<position> E<lt>= C<total>
=item *
-Is the string a parameter which is a device or partition name?
+If any progress notification is sent during a call, then a final
+progress notification is always sent when C<position> = C<total>.
-=item *
+This is to simplify caller code, so callers can easily set the
+progress indicator to "100%" at the end of the operation, without
+requiring special code to detect this case.
-Does the string begin with C</dev/sd>?
+=back
-=item *
+The callback also receives the procedure number and serial number of
+the call. These are only useful for debugging protocol issues, and
+the callback can normally ignore them. The callback may want to
+print these numbers in error messages or debugging messages.
-Does the named device exist? If so, we use that device.
-However if I<not> then we continue with this algorithm.
+=head1 PRIVATE DATA AREA
-=item *
+You can attach named pieces of private data to the libguestfs handle,
+and fetch them by name for the lifetime of the handle. This is called
+the private data area and is only available from the C API.
-Replace initial C</dev/sd> string with C</dev/hd>.
+To attach a named piece of data, use the following call:
-For example, change C</dev/sda2> to C</dev/hda2>.
+ void guestfs_set_private (guestfs_h *g, const char *key, void *data);
-If that named device exists, use it. If not, continue.
+C<key> is the name to associate with this data, and C<data> is an
+arbitrary pointer (which can be C<NULL>). Any previous item with the
+same name is overwritten.
-=item *
+You can use any C<key> you want, but names beginning with an
+underscore character are reserved for internal libguestfs purposes
+(for implementing language bindings). It is recommended to prefix the
+name with some unique string to avoid collisions with other users.
-Replace initial C</dev/sd> string with C</dev/vd>.
+To retrieve the pointer, use:
-If that named device exists, use it. If not, return an error.
+ void *guestfs_get_private (guestfs_h *g, const char *key);
-=back
+This function returns C<NULL> if either no data is found associated
+with C<key>, or if the user previously set the C<key>'s C<data>
+pointer to C<NULL>.
-=head2 PORTABILITY CONCERNS
+Libguestfs does not try to look at or interpret the C<data> pointer in
+any way. As far as libguestfs is concerned, it need not be a valid
+pointer at all. In particular, libguestfs does I<not> try to free the
+data when the handle is closed. If the data must be freed, then the
+caller must either free it before calling L</guestfs_close> or must
+set up a close callback to do it (see L</guestfs_set_close_callback>,
+and note that only one callback can be registered for a handle).
-Although the standard naming scheme and automatic translation is
-useful for simple programs and guestfish scripts, for larger programs
-it is best not to rely on this mechanism.
+The private data area is implemented using a hash table, and should be
+reasonably efficient for moderate numbers of keys.
-Where possible for maximum future portability programs using
-libguestfs should use these future-proof techniques:
+=begin html
-=over 4
+<!-- old anchor for the next section -->
+<a name="state_machine_and_low_level_event_api"/>
-=item *
+=end html
-Use L</guestfs_list_devices> or L</guestfs_list_partitions> to list
-actual device names, and then use those names directly.
+=head1 ARCHITECTURE
-Since those device names exist by definition, they will never be
-translated.
+Internally, libguestfs is implemented by running an appliance (a
+special type of small virtual machine) using L<qemu(1)>. Qemu runs as
+a child process of the main program.
-=item *
+ ___________________
+ / \
+ | main program |
+ | |
+ | | child process / appliance
+ | | __________________________
+ | | / qemu \
+ +-------------------+ RPC | +-----------------+ |
+ | libguestfs <--------------------> guestfsd | |
+ | | | +-----------------+ |
+ \___________________/ | | Linux kernel | |
+ | +--^--------------+ |
+ \_________|________________/
+ |
+ _______v______
+ / \
+ | Device or |
+ | disk image |
+ \______________/
-Use higher level ways to identify filesystems, such as LVM names,
-UUIDs and filesystem labels.
+The library, linked to the main program, creates the child process and
+hence the appliance in the L</guestfs_launch> function.
-=back
+Inside the appliance is a Linux kernel and a complete stack of
+userspace tools (such as LVM and ext2 programs) and a small
+controlling daemon called L</guestfsd>. The library talks to
+L</guestfsd> using remote procedure calls (RPC). There is a mostly
+one-to-one correspondence between libguestfs API calls and RPC calls
+to the daemon. Lastly the disk image(s) are attached to the qemu
+process which translates device access by the appliance's Linux kernel
+into accesses to the image.
+
+A common misunderstanding is that the appliance "is" the virtual
+machine. Although the disk image you are attached to might also be
+used by some virtual machine, libguestfs doesn't know or care about
+this. (But you will care if both libguestfs's qemu process and your
+virtual machine are trying to update the disk image at the same time,
+since these usually results in massive disk corruption).
+
+=head1 STATE MACHINE
+
+libguestfs uses a state machine to model the child process:
+
+ |
+ guestfs_create
+ |
+ |
+ ____V_____
+ / \
+ | CONFIG |
+ \__________/
+ ^ ^ ^ \
+ / | \ \ guestfs_launch
+ / | _\__V______
+ / | / \
+ / | | LAUNCHING |
+ / | \___________/
+ / | /
+ / | guestfs_launch
+ / | /
+ ______ / __|____V
+ / \ ------> / \
+ | BUSY | | READY |
+ \______/ <------ \________/
+
+The normal transitions are (1) CONFIG (when the handle is created, but
+there is no child process), (2) LAUNCHING (when the child process is
+booting up), (3) alternating between READY and BUSY as commands are
+issued to, and carried out by, the child process.
+
+The guest may be killed by L</guestfs_kill_subprocess>, or may die
+asynchronously at any time (eg. due to some internal error), and that
+causes the state to transition back to CONFIG.
+
+Configuration commands for qemu such as L</guestfs_add_drive> can only
+be issued when in the CONFIG state.
+
+The API offers one call that goes from CONFIG through LAUNCHING to
+READY. L</guestfs_launch> blocks until the child process is READY to
+accept commands (or until some failure or timeout).
+L</guestfs_launch> internally moves the state from CONFIG to LAUNCHING
+while it is running.
+
+API actions such as L</guestfs_mount> can only be issued when in the
+READY state. These API calls block waiting for the command to be
+carried out (ie. the state to transition to BUSY and then back to
+READY). There are no non-blocking versions, and no way to issue more
+than one command per handle at the same time.
+
+Finally, the child process sends asynchronous messages back to the
+main program, such as kernel log messages. You can register a
+callback to receive these messages.
=head1 INTERNALS
=head3 INITIAL MESSAGE
-Because the underlying channel (QEmu -net channel) doesn't have any
-sort of connection control, when the daemon launches it sends an
-initial word (C<GUESTFS_LAUNCH_FLAG>) which indicates that the guest
-and daemon is alive. This is what L</guestfs_launch> waits for.
-
-=head1 MULTIPLE HANDLES AND MULTIPLE THREADS
-
-All high-level libguestfs actions are synchronous. If you want
-to use libguestfs asynchronously then you must create a thread.
-
-Only use the handle from a single thread. Either use the handle
-exclusively from one thread, or provide your own mutex so that two
-threads cannot issue calls on the same handle at the same time.
-
-=head1 QEMU WRAPPERS
-
-If you want to compile your own qemu, run qemu from a non-standard
-location, or pass extra arguments to qemu, then you can write a
-shell-script wrapper around qemu.
-
-There is one important rule to remember: you I<must C<exec qemu>> as
-the last command in the shell script (so that qemu replaces the shell
-and becomes the direct child of the libguestfs-using program). If you
-don't do this, then the qemu process won't be cleaned up correctly.
+When the daemon launches it sends an initial word
+(C<GUESTFS_LAUNCH_FLAG>) which indicates that the guest and daemon is
+alive. This is what L</guestfs_launch> waits for.
-Here is an example of a wrapper, where I have built my own copy of
-qemu from source:
+=head3 PROGRESS NOTIFICATION MESSAGES
- #!/bin/sh -
- qemudir=/home/rjones/d/qemu
- exec $qemudir/x86_64-softmmu/qemu-system-x86_64 -L $qemudir/pc-bios "$@"
+The daemon may send progress notification messages at any time. These
+are distinguished by the normal length word being replaced by
+C<GUESTFS_PROGRESS_FLAG>, followed by a fixed size progress message.
-Save this script as C</tmp/qemu.wrapper> (or wherever), C<chmod +x>,
-and then use it by setting the LIBGUESTFS_QEMU environment variable.
-For example:
+The library turns them into progress callbacks (see
+C<guestfs_set_progress_callback>) if there is a callback registered,
+or discards them if not.
- LIBGUESTFS_QEMU=/tmp/qemu.wrapper guestfish
-
-Note that libguestfs also calls qemu with the -help and -version
-options in order to determine features.
+The daemon self-limits the frequency of progress messages it sends
+(see C<daemon/proto.c:notify_progress>). Not all calls generate
+progress messages.
=head1 LIBGUESTFS VERSION NUMBERS
Location of temporary directory, defaults to C</tmp>.
-If libguestfs was compiled to use the supermin appliance then each
-handle will require rather a large amount of space in this directory
-for short periods of time (~ 80 MB). You can use C<$TMPDIR> to
+If libguestfs was compiled to use the supermin appliance then the
+real appliance is cached in this directory, shared between all
+handles belonging to the same EUID. You can use C<$TMPDIR> to
configure another directory to use in case C</tmp> is not large
enough.