5 hivex - Windows Registry "hive" extraction library
9 hive_h *hivex_open (const char *filename, int flags);
10 int hivex_close (hive_h *h);
14 libhivex is a library for extracting the contents of Windows Registry
15 "hive" files. It is designed to be secure against buggy or malicious
18 Unlike many other tools in this area, it doesn't use the textual .REG
19 format for output, because parsing that is as much trouble as parsing
20 the original binary format. Instead it makes the file available
21 through a C API, or there is a separate program to export the hive as
22 XML (see L<hivexml(1)>), or to get individual keys (see
25 =head2 OPENING AND CLOSING A HIVE
29 =item hive_h *hivex_open (const char *filename, int flags);
31 Opens the hive named C<filename> for reading.
33 Flags is an ORed list of the open flags (or C<0> if you don't
34 want to pass any flags). These flags are defined:
38 =item HIVEX_OPEN_VERBOSE
42 =item HIVEX_OPEN_DEBUG
44 Very verbose messages, suitable for debugging problems in the library
47 This is also selected if the C<HIVEX_DEBUG> environment variable
50 =item HIVEX_OPEN_WRITE
52 Open the hive for writing. If omitted, the hive is read-only.
54 See L</WRITING TO HIVE FILES>.
58 C<hivex_open> returns a hive handle. On error this returns NULL and
59 sets C<errno> to indicate the error.
61 =item int hivex_close (hive_h *h);
63 Close a hive handle and free all associated resources.
65 Note that any uncommitted writes are I<not> committed by this call,
66 but instead are lost. See L</WRITING TO HIVE FILES>.
68 Returns 0 on success. On error this returns -1 and sets errno.
72 =head2 NAVIGATING THE TREE OF HIVE SUBKEYS
78 This is a node handle, an integer but opaque outside the library.
79 Valid node handles cannot be 0. The library returns 0 in some
80 situations to indicate an error.
82 =item hive_node_h hivex_root (hive_h *h);
84 Return root node of the hive. All valid registries must contain
87 On error this returns 0 and sets errno.
89 =item char *hivex_node_name (hive_h *h, hive_node_h node);
91 Return the name of the node. The name is reencoded as UTF-8
92 and returned as a C string.
94 The string should be freed by the caller when it is no longer needed.
96 Note that the name of the root node is a dummy, such as
97 C<$$$PROTO.HIV> (other names are possible: it seems to depend on the
98 tool or program that created the hive in the first place). You can
99 only know the "real" name of the root node by knowing which registry
100 file this hive originally comes from, which is knowledge that is
101 outside the scope of this library.
103 On error this returns NULL and sets errno.
105 =item hive_node_h *hivex_node_children (hive_h *h, hive_node_h node);
107 Return a 0-terminated array of nodes which are the subkeys
108 (children) of C<node>.
110 The array should be freed by the caller when it is no longer needed.
112 On error this returns NULL and sets errno.
114 =item hive_node_h hivex_node_get_child (hive_h *h, hive_node_h node, const char *name);
116 Return the child of node with the name C<name>, if it exists.
118 The name is matched case insensitively.
120 If the child node does not exist, this returns 0 without
123 On error this returns 0 and sets errno.
125 =item hive_node_h hivex_node_parent (hive_h *h, hive_node_h node);
127 Return the parent of C<node>.
129 On error this returns 0 and sets errno.
131 The parent pointer of the root node in registry files that we
132 have examined seems to be invalid, and so this function will
133 return an error if called on the root node.
137 =head2 GETTING VALUES AT A NODE
139 The enum below describes the possible types for the value(s)
145 hive_t_expand_string = 2,
150 hive_t_multiple_strings = 7,
151 hive_t_resource_list = 8,
152 hive_t_full_resource_description = 9,
153 hive_t_resource_requirements_list = 10,
161 This is a value handle, an integer but opaque outside the library.
162 Valid value handles cannot be 0. The library returns 0 in some
163 situations to indicate an error.
165 =item hive_value_h *hivex_node_values (hive_h *h, hive_node_h node);
167 Return the 0-terminated array of (key, value) pairs attached to
170 The array should be freed by the caller when it is no longer needed.
172 On error this returns NULL and sets errno.
174 =item hive_value_h hivex_node_get_value (hive_h *h, hive_node_h node, const char *key);
176 Return the value attached to this node which has the name C<key>,
179 The key name is matched case insensitively.
181 Note that to get the default key, you should pass the empty
182 string C<""> here. The default key is often written C<"@">, but
183 inside hives that has no meaning and won't give you the
186 If no such key exists, this returns 0 and does not set errno.
188 On error this returns 0 and sets errno.
190 =item char *hivex_value_key (hive_h *h, hive_value_h value);
192 Return the key (name) of a (key, value) pair. The name
193 is reencoded as UTF-8 and returned as a C string.
195 The string should be freed by the caller when it is no longer needed.
197 Note that this function can return a zero-length string. In the
198 context of Windows Registries, this means that this value is the
199 default key for this node in the tree. This is usually written
202 On error this returns NULL and sets errno.
204 =item int hivex_value_type (hive_h *h, hive_value_h value, hive_type *t, size_t *len);
206 Return the data type and length of the value in this (key, value)
207 pair. See also C<hivex_value_value> which returns all this
208 information, and the value itself. Also, C<hivex_value_*> functions
209 below which can be used to return the value in a more useful form when
210 you know the type in advance.
212 Returns 0 on success. On error this returns -1 and sets errno.
214 =item char *hivex_value_value (hive_h *h, hive_value_h value, hive_type *t, size_t *len);
216 Return the value of this (key, value) pair. The value should
217 be interpreted according to its type (see C<enum hive_type>).
219 The value is returned in an array of bytes of length C<len>.
221 The value should be freed by the caller when it is no longer needed.
223 On error this returns NULL and sets errno.
225 =item char *hivex_value_string (hive_h *h, hive_value_h value);
227 If this value is a string, return the string reencoded as UTF-8
228 (as a C string). This only works for values which have type
229 C<hive_t_string>, C<hive_t_expand_string> or C<hive_t_link>.
231 The string should be freed by the caller when it is no longer needed.
233 On error this returns NULL and sets errno.
235 =item char **hivex_value_multiple_strings (hive_h *h, hive_value_h value);
237 If this value is a multiple-string, return the strings reencoded
238 as UTF-8 (as a NULL-terminated array of C strings). This only
239 works for values which have type C<hive_t_multiple_strings>.
241 The string array and each string in it should be freed by the
242 caller when they are no longer needed.
244 On error this returns NULL and sets errno.
246 =item int32_t hivex_value_dword (hive_h *h, hive_value_h value);
248 If this value is a DWORD (Windows int32), return it. This only works
249 for values which have type C<hive_t_dword> or C<hive_t_dword_be>.
251 =item int64_t hivex_value_qword (hive_h *h, hive_value_h value);
253 If this value is a QWORD (Windows int64), return it. This only
254 works for values which have type C<hive_t_qword>.
258 =head2 VISITING ALL NODES
260 The visitor pattern is useful if you want to visit all nodes
261 in the tree or all nodes below a certain point in the tree.
263 First you set up your own C<struct hivex_visitor> with your
266 Each of these callback functions should return 0 on success or -1
267 on error. If any callback returns -1, then the entire visit
268 terminates immediately. If you don't need a callback function at
269 all, set the function pointer to NULL.
271 struct hivex_visitor {
272 int (*node_start) (hive_h *, void *opaque, hive_node_h, const char *name);
273 int (*node_end) (hive_h *, void *opaque, hive_node_h, const char *name);
274 int (*value_string) (hive_h *, void *opaque, hive_node_h, hive_value_h,
275 hive_type t, size_t len, const char *key, const char *str);
276 int (*value_multiple_strings) (hive_h *, void *opaque, hive_node_h,
277 hive_value_h, hive_type t, size_t len, const char *key, char **argv);
278 int (*value_string_invalid_utf16) (hive_h *, void *opaque, hive_node_h,
279 hive_value_h, hive_type t, size_t len, const char *key,
281 int (*value_dword) (hive_h *, void *opaque, hive_node_h, hive_value_h,
282 hive_type t, size_t len, const char *key, int32_t);
283 int (*value_qword) (hive_h *, void *opaque, hive_node_h, hive_value_h,
284 hive_type t, size_t len, const char *key, int64_t);
285 int (*value_binary) (hive_h *, void *opaque, hive_node_h, hive_value_h,
286 hive_type t, size_t len, const char *key, const char *value);
287 int (*value_none) (hive_h *, void *opaque, hive_node_h, hive_value_h,
288 hive_type t, size_t len, const char *key, const char *value);
289 int (*value_other) (hive_h *, void *opaque, hive_node_h, hive_value_h,
290 hive_type t, size_t len, const char *key, const char *value);
291 /* If value_any callback is not NULL, then the other value_*
292 * callbacks are not used, and value_any is called on all values.
294 int (*value_any) (hive_h *, void *opaque, hive_node_h, hive_value_h,
295 hive_type t, size_t len, const char *key, const char *value);
300 =item int hivex_visit (hive_h *h, const struct hivex_visitor *visitor, size_t len, void *opaque, int flags);
302 Visit all the nodes recursively in the hive C<h>.
304 C<visitor> should be a C<hivex_visitor> structure with callback
305 fields filled in as required (unwanted callbacks can be set to
306 NULL). C<len> must be the length of the 'visitor' struct (you
307 should pass C<sizeof (struct hivex_visitor)> for this).
309 This returns 0 if the whole recursive visit was completed
310 successfully. On error this returns -1. If one of the callback
311 functions returned an error than we don't touch errno. If the
312 error was generated internally then we set errno.
314 You can skip bad registry entries by setting C<flag> to
315 C<HIVEX_VISIT_SKIP_BAD>. If this flag is not set, then a bad registry
316 causes the function to return an error immediately.
318 This function is robust if the registry contains cycles or
319 pointers which are invalid or outside the registry. It detects
320 these cases and returns an error.
322 =item int hivex_visit_node (hive_h *h, hive_node_h node, const struct hivex_visitor *visitor, size_t len, void *opaque);
324 Same as C<hivex_visit> but instead of starting out at the root, this
329 =head1 THE STRUCTURE OF THE WINDOWS REGISTRY
331 Note: To understand the relationship between hives and the common
332 Windows Registry keys (like C<HKEY_LOCAL_MACHINE>) please see the
333 Wikipedia page on the Windows Registry.
335 The Windows Registry is split across various binary files, each
336 file being known as a "hive". This library only handles a single
339 Hives are n-ary trees with a single root. Each node in the tree
342 Each node in the tree (including non-leaf nodes) may have an
343 arbitrary list of (key, value) pairs attached to it. It may
344 be the case that one of these pairs has an empty key. This
345 is referred to as the default key for the node.
347 The (key, value) pairs are the place where the useful data is
348 stored in the registry. The key is always a string (possibly the
349 empty string for the default key). The value is a typed object
350 (eg. string, int32, binary, etc.).
352 =head2 RELATIONSHIP TO .REG FILES
354 Although this library does not care about or deal with Windows reg
355 files, it's useful to look at the relationship between the registry
356 itself and reg files because they are so common.
358 A reg file is a text representation of the registry, or part of the
359 registry. The actual registry hives that Windows uses are binary
360 files. There are a number of Windows and Linux tools that let you
361 generate reg files, or merge reg files back into the registry hives.
362 Notable amongst them is Microsoft's REGEDIT program (formerly known as
365 A typical reg file will contain many sections looking like this:
367 [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Stack]
369 "TileInfo"="prop:System.FileCount"
370 "TilePath"=str(2):"%systemroot%\\system32"
371 "ThumbnailCutoff"=dword:00000000
372 "FriendlyTypeName"=hex(2):40,00,25,00,53,00,79,00,73,00,74,00,65,00,6d,00,52,00,6f,00,\
373 6f,00,74,00,25,00,5c,00,53,00,79,00,73,00,74,00,65,00,6d,00,\
374 33,00,32,00,5c,00,73,00,65,00,61,00,72,00,63,00,68,00,66,00,\
375 6f,00,6c,00,64,00,65,00,72,00,2e,00,64,00,6c,00,6c,00,2c,00,\
376 2d,00,39,00,30,00,32,00,38,00,00,00,d8
378 Taking this one piece at a time:
380 [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Stack]
382 This is the path to this node in the registry tree. The first part,
383 C<HKEY_LOCAL_MACHINE\SOFTWARE> means that this comes from a hive
384 (file) called C<SOFTWARE>. C<\Classes\Stack> is the real path part,
385 starting at the root node of the C<SOFTWARE> hive.
387 Below the node name is a list of zero or more key-value pairs. Any
388 interior or leaf node in the registry may have key-value pairs
393 This is the "default key". In reality (ie. inside the binary hive)
394 the key string is the empty string. In reg files this is written as
395 C<@> but this has no meaning either in the hives themselves or in this
396 library. The value is a string (type 1 - see C<enum hive_type>
399 "TileInfo"="prop:System.FileCount"
401 This is a regular (key, value) pair, with the value being a type 1
402 string. Note that inside the binary file the string is likely to be
403 UTF-16 encoded. This library converts to and from UTF-8 strings
406 "TilePath"=str(2):"%systemroot%\\system32"
408 The value in this case has type 2 (expanded string) meaning that some
409 %...% variables get expanded by Windows. (This library doesn't know
410 or care about variable expansion).
412 "ThumbnailCutoff"=dword:00000000
414 The value in this case is a dword (type 4).
416 "FriendlyTypeName"=hex(2):40,00,....
418 This value is an expanded string (type 2) represented in the reg file
419 as a series of hex bytes. In this case the string appears to be a
422 =head1 NOTE ON THE USE OF ERRNO
424 Many functions in this library set errno to indicate errors. These
425 are the values of errno you may encounter (this list is not
432 Corrupt or unsupported Registry file format.
440 Passed an invalid argument to the function.
444 Followed a Registry pointer which goes outside
445 the registry or outside a registry block.
449 Registry contains cycles.
453 Field in the registry out of range.
457 =head1 ENVIRONMENT VARIABLES
463 Setting HIVEX_DEBUG=1 will enable very verbose messages. This is
464 useful for debugging problems with the library itself.
474 L<http://libguestfs.org/>,
477 L<http://en.wikipedia.org/wiki/Windows_Registry>.
481 Richard W.M. Jones (C<rjones at redhat dot com>)
485 Copyright (C) 2009-2010 Red Hat Inc.
487 Derived from code by Petter Nordahl-Hagen under a compatible license:
488 Copyright (C) 1997-2007 Petter Nordahl-Hagen.
490 Derived from code by Markus Stephany under a compatible license:
491 Copyright (C) 2000-2004 Markus Stephany.
493 This library is free software; you can redistribute it and/or
494 modify it under the terms of the GNU Lesser General Public
495 License as published by the Free Software Foundation;
496 version 2.1 of the License.
498 This library is distributed in the hope that it will be useful,
499 but WITHOUT ANY WARRANTY; without even the implied warranty of
500 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
501 Lesser General Public License for more details.
503 See file LICENSE for the full license.