2 Formally proving tiny bits of qemu using Frama-C
6 In early October I was talking to one of the developers of Frama-C,
7 which is a modular framework for verifying C programs. It's been on
8 my radar to learn for a long time, and I'm a co-maintainer of the
9 package in Fedora. I tried to follow one tutorial but never really
10 got past installation and UI issues. But the developer pointed me to
11 a very good, short set of blog postings about using the command line
14 I will provide links to tutorials etc at the end.
17 = OVERVIEW OF FRAMA-C ECOSYSTEM =
19 The name stands for "Framework for Static Analysis of the C language".
21 It's modular, with a core program that reads C source code
22 and turns it into Abstract Syntax Trees. And a set of plugins
23 to do static analysis by annotating these syntax trees.
24 Plugins can cooperate, so analysis can be passed between
27 The following slides are taken from David Mentré‘s 2016 presentation.
30 I decided to spend a day or two last month seeing if I could formally
31 prove some code inside qemu, and I arbitrarily picked one of the
32 smallest pieces of code in the "util/" subdirectory:
34 $ wc -l util/*.c | sort -nr | tail -20
36 Actually there are two files:
38 $ ls -l util/range.c include/qemu/range.h
40 This is what the Range type from the header file looks like. This is
41 the complete definition upstream, with comments which are supposed to
45 * Operations on 64 bit address ranges.
47 * - Ranges must not wrap around 0, but can include UINT64_MAX.
51 * Do not access members directly, use the functions!
52 * A non-empty range has @lob <= @upb.
53 * An empty range has @lob == @upb + 1.
55 uint64_t lob; /* inclusive lower bound */
56 uint64_t upb; /* inclusive upper bound */
58 typedef struct Range Range;
60 +-----------+-----------+------------- - - - - -
62 +-----------+-----------+------------- - - - - -
66 Using the tutorial as my guide I wrote some simple predicates, a
67 "predicate" being a statement which can be true or false:
69 * A non-empty range has @lob <= @upb.
70 * An empty range has @lob == @upb + 1.
73 predicate valid_range(struct Range range) =
74 range.lob <= range.upb + 1;
76 predicate empty_range(struct Range range) =
77 range.lob == range.upb + 1;
80 Notice a few things here:
82 - These are statements written in a formal language.
84 - I'm using structs directly from the C code.
86 - The upstream comments translate into machine-checkable code.
88 The first upstream function is:
91 static inline bool range_is_empty(const Range *range)
93 return range->lob > range->upb;
96 and using the predicates we can write a specification:
98 $ cat snippets/range_is_empty.c
100 And we can compile and prove that:
102 $ frama-c -wp -wp-rte snippets/range_is_empty.c
104 Frama-C parsed the C code and the formal specification and machine
105 checked it, and it's correct - the code is bug-free.
108 Going back to what we proved so far:
111 predicate valid_range(struct Range range) =
112 range.lob <= range.upb + 1;
113 predicate empty_range(struct Range range) =
114 range.lob == range.upb + 1;
117 /* Is range empty? */
119 requires \valid_read(range);
120 requires valid_range(*range);
122 ensures \result <==> empty_range(*range);
124 static inline bool range_is_empty(const Range *range)
126 return range->lob > range->upb;
129 $ frama-c -wp -wp-rte snippets/range_is_empty.c
131 The "@"-comments are ACSL. The C code is parsed by Frama-C into an
132 abstract syntax tree. We told Frama-C to apply the WP plugin. The
133 specification and C code were translated into first-order logic
134 statements and passed to the Alt-Ergo SMT prover to see if it could
135 find a Proof Obligation (PO).
137 We need to be very clear about what happened:
139 - The preconditions must be fulfilled in order for the proof to be
140 correct. If you are checking a whole program then that would check
141 that all callers of this function are fulfilling the preconditions,
142 but if you check only single functions then you are not doing that.
144 - The proof only shows that the post-condition holds.
146 - This only shows that the source code is correct. The compiler
147 could still compile the program incorrectly. If you really want to
148 get into this you have to use a verified C compiler like CompCert,
149 and there are even verified CPU implementations.
151 - Given those assumptions, the code is bug free - you don't need to
155 Obviously this is a single line, very trivial function, but I was
156 quite pleased that I was able to prove it quickly. I kept going on
157 the range file. The next function is:
159 /* Does range contain val? */
160 static inline bool range_contains(const Range *range, uint64_t val)
162 return val >= range->lob && val <= range->upb;
165 This is similarly easy to prove after adding another predicate at the
169 predicate value_in_range(struct Range range, uint64_t value) =
170 range.lob <= value <= range.upb;
173 /* Does range contain val? */
175 requires \valid_read(range);
176 requires valid_range(*range);
178 ensures \result <==> value_in_range(*range, val);
180 static inline bool range_contains(const Range *range, uint64_t val)
182 return val >= range->lob && val <= range->upb;
186 The next function is range_make_empty, again easy to prove using the
187 already existing empty_range predicate. Notice how we declare which
188 memory locations this function assigns to:
190 /* Initialize range to the empty range */
192 requires \valid(range);
193 requires valid_range(*range);
195 ensures empty_range(*range);
197 static inline void range_make_empty(Range *range)
199 *range = range_empty;
200 assert(range_is_empty(range));
204 I'm going to skip forward a few functions to get to an interesting one.
207 /* Get the size of range. */
208 static inline uint64_t range_size(const Range *range)
210 return range->upb - range->lob + 1;
213 My first attempt at a proof was:
216 logic integer size_of_bounds(integer lob, integer upb) =
219 logic integer size_of_range(struct Range range) =
220 size_of_bounds(range.lob, range.upb);
223 /* Get the size of range. */
225 requires \valid(range);
226 requires valid_range(*range);
228 ensures \result == size_of_range(*range);
230 static inline uint64_t range_size(const Range *range)
232 return range->upb - range->lob + 1;
235 A few things to notice about this:
237 - I'm using ACSL "logic" statements, which are similar to functions.
239 - ACSL has an "integer" type which is an unbounded integer.
241 Let's try to prove this one:
243 $ frama-c -wp -wp-rte snippets/range_size.c
246 This was confusing to me because at this point I'd probably spent an
247 hour, and I'd proven about 5 functions successfully, and this function
248 looks equally trivial and yet it cannot be proven. Why?
250 One way to find the problem would be to find a COUNTEREXAMPLE. A
251 counterexample is an instance of an input that satisfies all of the
252 preconditions, but makes the postcondition false. Frama-C has
253 pluggable provers, and one prover called Z3, originally written by
254 Microsoft, can be used with Frama-C and can sometimes find
257 $ frama-c -wp -wp-rte snippets/range_size.c -wp-prover alt-ergo,why3:z3-ce
259 Unfortunately Z3 cannot find a counterexample in this case. I even
260 upped the timeout to run Z3 longer but it still couldn't find one.
262 [Z3 model_compact issue: https://git.frama-c.com/pub/frama-c/-/issues/33]
264 So this is the counterexample which I worked out myself:
267 logic integer size_of_bounds(integer lob, integer upb) =
270 logic integer size_of_range(struct Range range) =
271 size_of_bounds(range.lob, range.upb);
274 /* Get the size of range. */
276 requires \valid(range);
277 requires valid_range(*range);
279 ensures \result == size_of_range(*range);
281 static inline uint64_t range_size(const Range *range)
283 return range->upb - range->lob + 1;
286 +-------------------------------------------------------+
287 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
288 +-------------------------------------------------------+
289 range->lob == 0 range->upb == UINT64_MAX
291 In this case size_of_range() returns an integer. Recall that integers
292 are an unbounded type. So it returns UINT64_MAX + 1 (a 65 bit
293 number). Our C function on the other hand returns 0.
295 This is a bug in the function. There are really two ways to
298 (1) The function shouldn't be called on empty ranges (size 0), and so
299 if the function returns 0 it means "the total range". Or:
301 (2) The function should have a different signature so it can return a
302 separate indication for "empty range" vs "total range".
304 I went with the first, so the final working specification is this. If
305 we were proving the entire program then we could statically verify
306 that callers do not try to call range_size() on the empty range.
309 requires \valid(range);
310 requires valid_range(*range);
311 requires !empty_range(*range);
313 // case UINT64_MAX+1 -> 0
314 ensures \result == (uint64_t)size_of_range(*range);
316 static inline uint64_t range_size(const Range *range)
318 return range->upb - range->lob + 1;
321 $ frama-c -wp -wp-rte snippets/range_size-good.c
324 On to the next function. Again this seems very simple, but in fact it
325 contains a serious problem:
328 * Initialize range to span the interval [lob,lob + size - 1].
329 * size may be 0. Range must not overflow.
331 static inline void range_init_nofail(Range *range, uint64_t lob,
335 range->upb = lob + size - 1;
338 What does lob == 0, size == 0 mean? It will create a total range.
340 +-------------------------------------------------------+
341 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
342 +-------------------------------------------------------+
343 range->lob == 0 range->upb == UINT64_MAX
345 What does lob == 1, size == 0 mean? It will create an empty range.
347 Because I didn't want to change the code, that makes the proof very
348 long, and it demonstrates another feature of ACSL specifications
352 requires \valid(range);
353 requires lob + (integer)size <= UINT64_MAX;
356 assumes lob > 0 && size == 0;
357 ensures empty_range(*range);
360 ensures range->lob == lob;
361 ensures size_of_bounds(lob, range->upb) == size;
363 assumes lob == 0 && size == 0;
364 ensures total_range(*range);
369 I also had to modify my earlier empty_range predicate:
372 predicate empty_range(struct Range range) =
373 range.lob == range.upb + 1;
377 predicate empty_range(struct Range range) =
378 range.lob > 0 && range.lob == range.upb + 1;
380 predicate total_range(struct Range range) =
381 range.lob == 0 && range.upb == UINT64_MAX;
384 What can we say about this? If you were developing the Range type and
385 the proof at the same time, you probably wouldn't define the Range
386 type in this way, and you probably wouldn't choose an initialization
387 function which had this complex set of behaviours.
389 I did prove most of the rest of the Range functions, but I wanted to
390 point out a few things about the final proof:
392 - The original file includes glib headers, but those cannot be parsed
393 by Frama-C's parser. The problem appears to be some unsupported
394 __attribute__s. Some attributes seem to work, and others are not
395 supported, and this of course limits what you can prove if it
396 depends on any external library that might use these.
398 - For the same reason I had to exclude a glib function for operating
399 on lists of ranges from the proof.
401 - You can run frama-c on the original source file, with a few #ifdef
402 modifications, so it would be possible to maintain the annotations
403 upstream, and run the proof checker as a CI test.
405 - It probably doesn't make sense for qemu right now though:
406 * concerted effort to prove more substantial pieces of code
407 * would need upstream effort to modularise qemu
408 * co-develop modules and proofs
413 This is a function from nbdkit:
415 /* Returns true if v is a power of 2.
417 * Uses the algorithm described at
418 * http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
421 is_power_of_2 (unsigned long v)
423 return v && ((v & (v - 1)) == 0);
426 and my initial specification started by defining:
429 predicate positive_power_of_2 (integer i) =
431 (i == 1 || ((i & 1) == 0 && positive_power_of_2 (i >> 1)));
433 lemma positive_power_of_2 (1);
434 lemma positive_power_of_2 (2);
435 lemma positive_power_of_2 (4);
436 lemma !positive_power_of_2 (7);
439 I was using the lemmas (which are like assertions, except statically
440 verified), to test my predicate, but sadly Frama-C was not able to
441 prove them even though they appear to be trivial.
443 This led to some discussion on stackoverflow:
445 https://stackoverflow.com/questions/64268418/how-do-i-write-an-is-power-of-2-predicate-in-acsl
447 Ultimately it was possible to fix this, but I discovered a few things
448 about Alt-Ergo (the default prover):
450 - It can't handle bitwise operations very well.
452 - You can guide Alt-Ergo, but it requires using the Frama-C GUI which
453 is very non-intuitive. You do end up with a script but I couldn't
454 work out how to integrate this with the command line.
456 - Z3 should be able to solve this, but I was not able to get
457 Z3 working with Frama-C on Fedora.
459 Essentially bitwise tricks like this are a hard case for automated
460 theorem proving. I gave up.
463 = TIMEVAL DIFFERENCE =
465 This is another nbdkit function:
467 /* Return the number of µs (microseconds) in y - x. */
468 static inline int64_t
469 tvdiff_usec (const struct timeval *x, const struct timeval *y)
473 usec = (y->tv_sec - x->tv_sec) * 1000000;
474 usec += y->tv_usec - x->tv_usec;
478 My first attempt at a proof was:
481 predicate valid_timeval (struct timeval tv) =
482 tv.tv_sec >= 0 && tv.tv_usec >= 0 && tv.tv_usec < 1000000;
483 logic integer tv_to_microseconds (struct timeval tv) =
484 tv.tv_sec * 1000000 + tv.tv_usec;
487 /* Return the number of µs (microseconds) in y - x. */
489 requires \valid_read (x) && \valid_read (y);
490 requires valid_timeval (*x) && valid_timeval (*y);
491 ensures \result == tv_to_microseconds (*y) - tv_to_microseconds (*x);
493 static inline int64_t
494 tvdiff_usec (const struct timeval *x, const struct timeval *y)
498 usec = (y->tv_sec - x->tv_sec) * 1000000;
499 usec += y->tv_usec - x->tv_usec;
503 This is not provable, and again the problem is integer overflow. The
504 tv_sec field is a 64 bit integer so it's quite easy to construct two
505 inputs which overflow the output.
507 Fixing this involves a considerably more complex specification:
510 requires \valid_read (x) && \valid_read (y);
511 requires valid_timeval (*x) && valid_timeval (*y);
515 ensures \result == 0;
516 ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
517 INT64_MIN <= diff <= INT64_MAX &&
520 ensures \result == -1;
522 ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
523 !(INT64_MIN <= diff <= INT64_MAX);
528 tvdiff_usec (const struct timeval *x, const struct timeval *y,
532 and of course that is not sufficient because I also had to fix the
533 code so it can return an overflow indication. And actually writing
534 that code is left as an exercise for the reader as it is not at all
537 A few things I learned from this:
539 - This is a function we have had and used for years, and I don't
540 think anyone ever thought it had a problem.
542 - Co-developing the specification alongside the code could help
545 - You can end up with functions that are much harder to use. If we
546 were proof-checking the whole program we could check all callers.
548 - GCC has __builtin_add_overflow and similar which can be used to do
549 mathematical operations and detect overflow, but Frama-C does not
550 understand them. We could contribute these to the Frama-C standard
556 Uli sent me this function from glibc:
558 [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c]
560 To be honest I had problems even fully understanding the semantics of
561 this function, let alone trying to translate that into ACSL.
563 So I thought I'd look at a simpler function:
565 [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c]
567 I want to point out first that this "strlen" is a fallback which is
568 only used if you don't have an optimized function for your hardware,
569 and for the vast majority of users they'll be using something like an
570 AVX-optimized strlen, not this one.
572 This function works in two halves, the first part iterates over single
573 characters until we reach an aligned boundary. And then the second
574 part looks at whole words and uses a clever trick to determine if any
575 byte in the word is zero.
577 In theory a simple specification which would apply to any strlen-style
581 requires valid_read_string (str);
583 ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') &&
584 str[\result] == '\0';
586 size_t strlen (const char *str)
590 It would be possible to use this to prove a simple strlen
591 implementation that just worked character by character.
593 Unfortunately for the actual glibc function we hit another limitation
594 of the WP plugin, called the Typed Memory Model.
596 WP models memory as a set of cells, with a type attached to each cell.
597 So a string for example is a set of char cells, each cell being typed
598 as char. You cannot cast these cells to something like a word and try
599 to do a proof over that. Simply a limitation which is well
602 So a proof of the glibc function eludes me.
605 There is a set of open source licensed string functions with Frama-C
608 https://github.com/evdenis/verker
610 and this is what the strlen function with proof looks like from that:
612 [https://github.com/evdenis/verker/blob/master/src/strlen.c]
615 Now you might be asking what happens when you write a function that
616 uses strlen, for example this trivial function with a working
619 $ cat snippets/last_char.c
620 $ frama-c -wp -wp-rte snippets/last_char.c
622 The questions here are:
624 - Is this proving the glibc strlen function?
626 - Are we calling glibc strlen() from the specification?
628 $ less /usr/share/frama-c/libc/string.h
629 $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h
631 And the answers are no and no. In fact what's happening here is we
632 are using Frama-C's own "string.h" header. This header defines and
633 contains a specification for how strlen. But this is also not a proof
634 of strlen: in fact a second file defines what's known as an "axiomatic
635 definition" or theory of how strlen works.
637 Essentially what's happening is we're assuming that strlen in your
638 standard library (which might not even be glibc) works. To make a
639 complete proof you'd also need to additionally prove all libraries you
642 Another little fun thing is Frama-C's strlen function can return -1,
643 which is used to represent an unbounded string. Of course unbounded
644 strings cannot exist on real computers, but they can exist on
650 * Frama-C is a real open source tool used by companies to verify
651 safety-critical software. If you've flown in an Airbus or used
652 electricity from a French nuclear power plant then you've used this
655 * Not as hard to use as I imagined. I spent about 6-12 hours from
656 zero to being able to fully prove a whole file of production code.
658 * Mostly useful for small functions. There are other tools, like TLA+
659 for proving distributed algorithms.
661 * Forces you to think very hard about assumptions and corner cases,
662 but that's the nature of the problem.
664 * Very well documented, questions answered quickly on stackoverflow.
666 But it has its limits:
668 * Typed Memory Model turns out to be restrictive.
670 * No support for malloc.
672 * Not good at bitwise ops (can Z3 help?).
674 * Cannot parse some non-standard __attribute__s (glib).
676 * Lacks some __builtin_* functions.
678 Alternatives and related programs:
680 * Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally
681 verified subset of the Ada programming language
682 (tagline: "You simply can't write better code").
684 * KLEE (http://klee.github.io/) has similarities but is not a formal
687 * CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler
692 * Tutorial that I followed in October which I thought was a good
694 https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html
695 https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html
696 https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html
698 * Frama-C on Stack Overflow:
699 https://stackoverflow.com/questions/tagged/frama-c
701 * Allan Blanchard's tutorial:
702 https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf
704 * David Mentre's introduction:
705 https://www.cs.umd.edu/class/spring2016/cmsc838G/frama-c/introduction-to-frama-c_v2.pdf
707 * Upstream documentation:
708 Manual: https://frama-c.com/download/frama-c-user-manual.pdf
709 WP manual: https://frama-c.com/download/frama-c-wp-manual.pdf
710 ACSL: https://frama-c.com/acsl.html
712 * ACSL specifications for various string functions:
713 https://github.com/evdenis/verker