2 # Copyright (C) 2009-2010 Red Hat Inc.
3 # Derived from code by Petter Nordahl-Hagen under a compatible license:
4 # Copyright (c) 1997-2007 Petter Nordahl-Hagen.
5 # Derived from code by Markus Stephany under a compatible license:
6 # Copyright (c)2000-2004, Markus Stephany.
8 # This library is free software; you can redistribute it and/or
9 # modify it under the terms of the GNU Lesser General Public
10 # License as published by the Free Software Foundation; either
11 # version 2 of the License, or (at your option) any later version.
13 # This library is distributed in the hope that it will be useful,
14 # but WITHOUT ANY WARRANTY; without even the implied warranty of
15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
16 # Lesser General Public License for more details.
18 # You should have received a copy of the GNU Lesser General Public
19 # License along with this library; if not, write to the Free Software
20 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
26 Win::Hivex::Regedit - Helper for reading and writing regedit format files
31 use Win::Hivex::Regedit qw(reg_import reg_export);
33 $h = Win::Hivex->open ('SOFTWARE', write => 1);
35 open FILE, "updates.reg";
36 reg_import (\*FILE, $h);
39 reg_export ($h, "\\Microsoft\\Windows NT\\CurrentVersion", \*OUTFILE,
40 prefix => "HKEY_LOCAL_MACHINE\\SOFTWARE");
44 Win::Hivex::Regedit is a helper library for reading and writing the
45 Windows regedit (or C<.REG>) file format. This is the textual format
46 that is commonly used on Windows for distributing groups of Windows
47 Registry changes, and this format is read and written by the
48 proprietary C<reg.exe> and C<regedit.exe> programs supplied with
49 Windows. It is I<not> the same as the binary "hive" format which the
50 hivex library itself can read and write. Note that the regedit format
51 is not well-specified, and hence deviations can occur between what the
52 Windows program can read/write and what we can read/write. (Please
53 file bugs for any deviations found).
55 Win::Hivex::Regedit is the low-level Perl library. There is also a
56 command line tool for combining hive files and reg files
57 (L<hivexregedit(1)>). If you have a Windows virtual machine that you need
58 to merge regedit-format changes into, use the high-level
59 L<virt-win-reg(1)> tool (part of libguestfs tools).
65 package Win::Hivex::Regedit;
70 use Carp qw(croak confess);
71 use Encode qw(encode);
75 use vars qw(@EXPORT_OK @ISA);
78 @EXPORT_OK = qw(reg_import reg_export);
82 reg_import ($fh, ($h|$map), [encoding => "UTF-16LE"]);
84 This function imports the registry keys from file handle C<$fh> either
85 into the hive C<$h> or via a map function.
87 The hive handle C<$h> must have been opened for writing, ie.
88 using the C<write =E<gt> 1> flag to C<Win::Hivex-E<gt>open>.
90 In the binary hive file, the first part of the key name (eg.
91 C<HKEY_LOCAL_MACHINE\SOFTWARE>) is not stored. You just have to know
92 (somehow) that this maps to the C<SOFTWARE> hive. Therefore if you
93 are given a file containing a mixture of keys that have to be added to
94 different hives, you have to have a way to map these to the hive
95 handles. This is outside the scope of the hivex library, but if the
96 second argument is a CODEREF (ie. reference to a function) then this
97 C<$map> function is called on each key name:
102 As shown, the function should return a pair, hive handle, and the true
103 key name (with the prefix stripped off). For example:
106 if ($_[0] =~ /^HKEY_LOCAL_MACHINE\\SOFTWARE(.*)/i) {
107 return ($software_h, $1);
111 C<encoding> is the encoding used by default for strings. If not
112 specified, this defaults to C<"UTF-16LE">, however we highly advise
113 you to specify it. See L</ENCODING STRINGS> below.
115 As with the regedit program, we merge the new registry keys with
116 existing ones, and new node values with old ones. You can use the
117 C<-> (minus) character to delete individual keys and values. This is
118 explained in detail in the Wikipedia page on the Windows Registry.
120 Remember you need to call C<$h-E<gt>commit (undef)> on the hivex
121 handle before any changes are written to the hive file. See
122 L<hivex(3)/WRITING TO HIVE FILES>.
133 my $encoding = $params{encoding} || "utf-16le";
142 # Join continuation lines. This is recipe 8.1 from the Perl
143 # Cookbook. Note we allow spaces after the final \ because
144 # this is fairly common in pasted regedit files.
149 redo unless eof ($fh);
152 #print STDERR "reg_import: parsing <<<$_>>>\n";
154 if ($state eq "outer") {
155 # Ignore blank lines, headers.
158 # .* is needed before Windows Registry Editor Version.. in
159 # order to eat a possible Unicode BOM which regedit writes
161 next if /^.*Windows Registry Editor Version.*/;
167 # Expect to see [...] or -[...]
168 # to merge or delete a node respectively.
169 if (/^\[(.*)\]\s*$/) {
174 } elsif (/^-\[(.*)\]\s*$/) {
175 _delete_node ($hmap, \%params, $1);
178 croak (_unexpected ($_, $lineno));
180 } elsif ($state eq "inner") {
181 if (/^(".*)=-\s*$/) { # delete value
182 my $key = _parse_quoted_string ($_);
183 croak (_parse_error ($_, $lineno)) unless defined $key;
184 push @delvalues, $key;
185 } elsif (/^@=-\s*$/) { # delete default key
187 } elsif (/^".*"=/) { # ordinary value
188 my $value = _parse_key_value ($_, $encoding);
189 croak (_parse_error ($_, $lineno)) unless defined $value;
190 push @newvalues, $value;
191 } elsif (/^@=(.*)/) { # default key
192 my $value = _parse_value ("", $1, $encoding);
193 croak (_parse_error ($_, $lineno)) unless defined $value;
194 push @newvalues, $value;
195 } elsif (/^\s*$/) { # blank line after values
196 _merge_node ($hmap, \%params, $newnode, \@newvalues, \@delvalues);
199 croak (_unexpected ($_, $lineno));
204 # Still got a node left over to merge?
205 if ($state eq "inner") {
206 _merge_node ($hmap, \%params, $newnode, \@newvalues, \@delvalues);
213 my $encoding = shift;
215 ($key, $_) = _parse_quoted_string ($_);
216 return undef unless defined $key;
217 return undef unless substr ($_, 0, 1) eq "=";
218 return _parse_value ($key, substr ($_, 1), $encoding);
221 # Parse a double-quoted string, returning the string. \ is used to
222 # escape double-quotes and other backslash characters.
224 # If called in array context and if there is anything after the quoted
225 # string, it is returned as the second element of the array.
227 # Returns undef if there was a parse error.
228 sub _parse_quoted_string
232 # No initial quote character.
233 return undef if substr ($_, 0, 1) ne "\"";
237 for ($i = 1; $i < length; ++$i) {
238 my $c = substr ($_, $i, 1);
241 } elsif ($c eq "\\") {
243 $c = substr ($_, $i, 1);
250 # No final quote character.
251 return undef if $i == length;
253 $_ = substr ($_, $i+1);
261 # Parse the value, optionally prefixed by a type.
268 my $encoding = shift; # default encoding for strings
273 if (m/^dword:([[:xdigit:]]{8})$/) { # DWORD
275 $data = _dword_le (hex ($1));
276 } elsif (m/^hex:(.*)$/) { # hex digits
278 $data = _data_from_hex_digits ($1);
279 return undef unless defined $data;
280 } elsif (m/^hex\(([[:xdigit:]]+)\):(.*)$/) { # hex digits
282 $data = _data_from_hex_digits ($2);
283 return undef unless defined $data;
284 } elsif (m/^str:(".*")$/) { # only in Wine fake-registries, I think
286 $data = _parse_quoted_string ($1);
287 return undef unless defined $data;
288 $data .= "\0"; # Value strings are implicitly ASCIIZ.
289 $data = encode ($encoding, $data);
290 } elsif (m/^str\(([[:xdigit:]]+)\):(".*")$/) {
292 $data = _parse_quoted_string ($2);
293 return undef unless defined $data;
294 $data .= "\0"; # Value strings are implicitly ASCIIZ.
295 $data = encode ($encoding, $data);
296 } elsif (m/^(".*")$/) {
298 $data = _parse_quoted_string ($1);
299 return undef unless defined $data;
300 $data .= "\0"; # Value strings are implicitly ASCIIZ.
301 $data = encode ($encoding, $data);
306 my %h = ( key => $key, t => $type, value => $data );
315 sub _data_from_hex_digits
328 my $newvalues = shift;
329 my $delvalues = shift;
332 ($h, $path) = _map_handle ($hmap, $path);
334 my $node = _node_lookup ($h, $path);
335 if (!defined $node) { # Need to create this node.
337 $name = $1 if $path =~ /([^\\]+)$/;
338 my $parentpath = $path;
339 $parentpath =~ s/[^\\]+$//;
340 my $parent = _node_lookup ($h, $parentpath);
341 if (!defined $parent) {
342 confess "reg_import: cannot create $path since parent $parentpath does not exist"
344 $node = $h->node_add_child ($parent, $name);
347 # Get the current set of values at this node.
348 my @values = $h->node_values ($node);
350 # Delete values in @delvalues original and values that are going
352 my @delvalues = @$delvalues;
353 foreach (@$newvalues) {
354 push @delvalues, $_->{key};
356 @values = grep { ! _imember ($h->value_key ($_), @delvalues) } @values;
358 # Get the actual values from the hive.
360 my $key = $h->value_key ($_);
361 my ($type, $data) = $h->value_value ($_);
362 my %h = ( key => $key, t => $type, value => $data );
366 # Add the new values.
367 push @values, @$newvalues;
369 $h->node_set_values ($node, \@values);
380 ($h, $path) = _map_handle ($hmap, $path);
382 my $node = _node_lookup ($h, $path);
383 # Not an error to delete a non-existant node.
384 return unless defined $node;
386 # However you cannot delete the root node.
387 confess "reg_import: the root node of a hive cannot be deleted"
388 if $node == $h->root ();
390 $h->node_delete_child ($node);
393 # Call the map function, if necessary.
396 local $_; # called function may use this
401 if (ref ($hmap) eq "CODE") {
402 ($h, $path) = &$hmap ($path);
413 return 1 if lc ($_) eq lc ($item);
423 "reg_import: parse error: unexpected text found at line $lineno near\n$_"
431 "reg_import: parse error: at line $lineno near\n$_"
436 reg_export ($h, $key, $fh, [prefix => $prefix]);
438 This function exports the registry keys starting at the root
439 C<$key> and recursively downwards into the file handle C<$fh>.
441 C<$key> is a case-insensitive path of the node to start from, relative
442 to the root of the hive. It is an error if this path does not exist.
443 Path elements should be separated by backslash characters.
445 C<$prefix> is prefixed to each key name. The usual use for this is to
446 make key names appear as they would on Windows. For example the key
447 C<\Foo> in the SOFTWARE Registry, with $prefix
448 C<HKEY_LOCAL_MACHINE\SOFTWARE>, would be written as:
450 [HKEY_LOCAL_MACHINE\SOFTWARE\Foo]
454 The output is written as pure 7 bit ASCII, with line endings which are
455 the default for the local host. You may need to convert the file's
456 encoding using L<iconv(1)> and line endings using L<unix2dos(1)> if
457 sending to a Windows user. Strings are always encoded as hex bytes.
458 See L</ENCODING STRINGS> below.
460 Nodes and keys are sorted alphabetically in the output.
462 This function does I<not> print a header. The real regedit program
463 will print a header like:
465 Windows Registry Editor Version 5.00
467 followed by a blank line. (Other headers are possible, see the
468 Wikipedia page on the Windows Registry). If you want a header, you
469 need to write it out yourself.
478 my $node = _node_lookup ($h, $key);
479 croak "$key: path not found in this hive" unless $node;
481 reg_export_node ($h, $node, @_);
484 =head2 reg_export_node
486 reg_export_node ($h, $node, $fh, ...);
488 This is exactly the same as L</reg_export> except that instead
489 of specifying the path to a key as a string, you pass a hivex
490 library C<$node> handle.
502 confess "reg_export_node: \$node parameter was undef" unless defined $node;
504 # Get the canonical path of this node.
505 my $path = _node_canonical_path ($h, $node);
509 my $prefix = $params{prefix};
510 if (defined $prefix) {
511 chop $prefix if substr ($prefix, -1, 1) eq "\\";
518 my @values = $h->node_values ($node);
523 my $key = $h->value_key ($_);
524 my ($type, $data) = $h->value_value ($_);
525 $_ = { key => $key, type => $type, data => $data }
528 @values = sort { $a->{key} cmp $b->{key} } @values;
533 my $type = $_->{type};
534 my $data = $_->{data};
537 print $fh '@=' # default key
539 print $fh '"', _escape_quotes ($key), '"='
542 if ($type eq 4 && length ($data) == 4) { # only handle dword specially
543 my $dword = unpack ("V", $data);
544 printf $fh "dword:%08x\n", $dword
546 # Encode everything else as hex, see encoding section below.
547 printf $fh "hex(%x):", $type;
548 my $hex = join (",", map { sprintf "%02x", ord } split (//, $data));
554 my @children = $h->node_children ($node);
555 @children = sort { $h->node_name ($a) cmp $h->node_name ($b) } @children;
556 reg_export_node ($h, $_, $fh, @_) foreach @children;
559 # Escape " and \ when printing keys.
568 # Look up a node in the registry starting from the path.
569 # Return undef if it doesn't exist.
577 my @path = split /\\/, $path;
578 shift @path if @path > 0 && $path[0] eq "";
580 my $node = $h->root ();
582 $node = $h->node_get_child ($node, $_);
583 return undef unless defined $node;
589 # Return the canonical path of node in the hive.
591 sub _node_canonical_path
597 return "\\" if $node == $h->root ();
598 $_ = $h->node_name ($node);
599 my $parent = $h->node_parent ($node);
600 my $path = _node_canonical_path ($h, $parent);
608 =head1 ENCODING STRINGS
610 The situation with encoding strings in the Registry on Windows is very
611 confused. There are two main encodings that you would find in the
612 binary (hive) file, 7 bit ASCII and UTF-16LE. (Other encodings are
613 possible, it's also possible to have arbitrary binary data incorrectly
614 marked with a string type).
616 The hive file itself doesn't contain any indication of string
617 encoding. Windows probably guesses the encoding.
619 We think that regedit probably either guesses which encoding to use
620 based on the file encoding, or else has different defaults for
621 different versions of Windows. Neither choice is appropriate for a
622 tool used in a real operating system.
624 When using L</reg_import>, you should specify the default encoding for
625 strings using the C<encoding> parameter. If not specified, it
626 defaults to UTF-16LE.
628 The file itself that is imported should be in the local encoding for
629 files (usually UTF-8 on modern Linux systems). This means if you
630 receive a regedit file from a Windows system, you may sometimes have
633 iconv -f utf-16le -t utf-8 < input.reg | dos2unix > output.reg
635 When writing regedit files (L</reg_export>) we bypass this madness
636 completely. I<All> strings (even pure ASCII) are written as hex bytes
637 so there is no doubt about how they should be encoded when they are
646 Copyright (C) 2010 Red Hat Inc.
650 Please see the file COPYING.LIB for the full license.
662 L<http://libguestfs.org>,