Monday 16th Nov 2020 Formally proving tiny bits of qemu using Frama-C In early October I was talking to one of the developers of Frama-C, which is a modular framework for verifying C programs. It's been on my radar to learn for a long time, and I'm a co-maintainer of the package in Fedora. I tried to follow one tutorial but never really got past installation and UI issues. But the developer pointed me to a very good, short set of blog postings about using the command line tool. I will provide links to tutorials etc at the end. = OVERVIEW OF FRAMA-C ECOSYSTEM = The name stands for "Framework for Static Analysis of the C language". It's modular, with a core program that reads C source code and turns it into Abstract Syntax Trees. And a set of plugins to do static analysis by annotating these syntax trees. Plugins can cooperate, so analysis can be passed between plugins. The following slides are taken from David Mentré‘s 2016 presentation. I decided to spend a day or two last month seeing if I could formally prove some code inside qemu, and I arbitrarily picked one of the smallest pieces of code in the "util/" subdirectory: $ wc -l util/*.c | sort -nr | tail -20 Actually there are two files: $ ls -l util/range.c include/qemu/range.h This is what the Range type from the header file looks like. This is the complete definition upstream, with comments which are supposed to explain how it works: /* * Operations on 64 bit address ranges. * Notes: * - Ranges must not wrap around 0, but can include UINT64_MAX. */ struct Range { /* * Do not access members directly, use the functions! * A non-empty range has @lob <= @upb. * An empty range has @lob == @upb + 1. */ uint64_t lob; /* inclusive lower bound */ uint64_t upb; /* inclusive upper bound */ }; typedef struct Range Range; +-----------+-----------+------------- - - - - - | | | +-----------+-----------+------------- - - - - - ^ ^ lob ----- upb Using the tutorial as my guide I wrote some simple predicates, a "predicate" being a statement which can be true or false: * A non-empty range has @lob <= @upb. * An empty range has @lob == @upb + 1. /*@ predicate valid_range(struct Range range) = range.lob <= range.upb + 1; predicate empty_range(struct Range range) = range.lob == range.upb + 1; */ Notice a few things here: - These are statements written in a formal language. - I'm using structs directly from the C code. - The upstream comments translate into machine-checkable code. The first upstream function is: /* Is range empty? */ static inline bool range_is_empty(const Range *range) { return range->lob > range->upb; } and using the predicates we can write a specification: $ cat snippets/range_is_empty.c And we can compile and prove that: $ frama-c -wp -wp-rte snippets/range_is_empty.c Frama-C parsed the C code and the formal specification and machine checked it, and it's correct - the code is bug-free. Going back to what we proved so far: /*@ predicate valid_range(struct Range range) = range.lob <= range.upb + 1; predicate empty_range(struct Range range) = range.lob == range.upb + 1; */ /* Is range empty? */ /*@ requires \valid_read(range); requires valid_range(*range); assigns \nothing; ensures \result <==> empty_range(*range); */ static inline bool range_is_empty(const Range *range) { return range->lob > range->upb; } $ frama-c -wp -wp-rte snippets/range_is_empty.c The "@"-comments are ACSL. The C code is parsed by Frama-C into an abstract syntax tree. We told Frama-C to apply the WP plugin. The specification and C code were translated into first-order logic statements and passed to the Alt-Ergo SMT prover to see if it could find a Proof Obligation (PO). We need to be very clear about what happened: - The preconditions must be fulfilled in order for the proof to be correct. If you are checking a whole program then that would check that all callers of this function are fulfilling the preconditions, but if you check only single functions then you are not doing that. - The proof only shows that the post-condition holds. - This only shows that the source code is correct. The compiler could still compile the program incorrectly. If you really want to get into this you have to use a verified C compiler like CompCert, and there are even verified CPU implementations. - Given those assumptions, the code is bug free - you don't need to write any tests. Obviously this is a single line, very trivial function, but I was quite pleased that I was able to prove it quickly. I kept going on the range file. The next function is: /* Does range contain val? */ static inline bool range_contains(const Range *range, uint64_t val) { return val >= range->lob && val <= range->upb; } This is similarly easy to prove after adding another predicate at the top of the file: /*@ predicate value_in_range(struct Range range, uint64_t value) = range.lob <= value <= range.upb; */ /* Does range contain val? */ /*@ requires \valid_read(range); requires valid_range(*range); assigns \nothing; ensures \result <==> value_in_range(*range, val); */ static inline bool range_contains(const Range *range, uint64_t val) { return val >= range->lob && val <= range->upb; } The next function is range_make_empty, again easy to prove using the already existing empty_range predicate. Notice how we declare which memory locations this function assigns to: /* Initialize range to the empty range */ /*@ requires \valid(range); requires valid_range(*range); assigns *range; ensures empty_range(*range); */ static inline void range_make_empty(Range *range) { *range = range_empty; assert(range_is_empty(range)); } I'm going to skip forward a few functions to get to an interesting one. This seems trivial: /* Get the size of range. */ static inline uint64_t range_size(const Range *range) { return range->upb - range->lob + 1; } My first attempt at a proof was: /*@ logic integer size_of_bounds(integer lob, integer upb) = upb - lob + 1; logic integer size_of_range(struct Range range) = size_of_bounds(range.lob, range.upb); */ /* Get the size of range. */ /*@ requires \valid(range); requires valid_range(*range); assigns \nothing; ensures \result == size_of_range(*range); */ static inline uint64_t range_size(const Range *range) { return range->upb - range->lob + 1; } A few things to notice about this: - I'm using ACSL "logic" statements, which are similar to functions. - ACSL has an "integer" type which is an unbounded integer. Let's try to prove this one: $ frama-c -wp -wp-rte snippets/range_size.c It fails! This was confusing to me because at this point I'd probably spent an hour, and I'd proven about 5 functions successfully, and this function looks equally trivial and yet it cannot be proven. Why? One way to find the problem would be to find a COUNTEREXAMPLE. A counterexample is an instance of an input that satisfies all of the preconditions, but makes the postcondition false. Frama-C has pluggable provers, and one prover called Z3, originally written by Microsoft, can be used with Frama-C and can sometimes find counterexamples. $ frama-c -wp -wp-rte snippets/range_size.c -wp-prover alt-ergo,why3:z3-ce Unfortunately Z3 cannot find a counterexample in this case. I even upped the timeout to run Z3 longer but it still couldn't find one. [Z3 model_compact issue: https://git.frama-c.com/pub/frama-c/-/issues/33] So this is the counterexample which I worked out myself: /*@ logic integer size_of_bounds(integer lob, integer upb) = upb - lob + 1; logic integer size_of_range(struct Range range) = size_of_bounds(range.lob, range.upb); */ /* Get the size of range. */ /*@ requires \valid(range); requires valid_range(*range); assigns \nothing; ensures \result == size_of_range(*range); */ static inline uint64_t range_size(const Range *range) { return range->upb - range->lob + 1; } +-------------------------------------------------------+ |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| +-------------------------------------------------------+ range->lob == 0 range->upb == UINT64_MAX In this case size_of_range() returns an integer. Recall that integers are an unbounded type. So it returns UINT64_MAX + 1 (a 65 bit number). Our C function on the other hand returns 0. This is a bug in the function. There are really two ways to look at this: (1) The function shouldn't be called on empty ranges (size 0), and so if the function returns 0 it means "the total range". Or: (2) The function should have a different signature so it can return a separate indication for "empty range" vs "total range". I went with the first, so the final working specification is this. If we were proving the entire program then we could statically verify that callers do not try to call range_size() on the empty range. /*@ requires \valid(range); requires valid_range(*range); requires !empty_range(*range); assigns \nothing; // case UINT64_MAX+1 -> 0 ensures \result == (uint64_t)size_of_range(*range); */ static inline uint64_t range_size(const Range *range) { return range->upb - range->lob + 1; } $ frama-c -wp -wp-rte snippets/range_size-good.c On to the next function. Again this seems very simple, but in fact it contains a serious problem: /* * Initialize range to span the interval [lob,lob + size - 1]. * size may be 0. Range must not overflow. */ static inline void range_init_nofail(Range *range, uint64_t lob, uint64_t size) { range->lob = lob; range->upb = lob + size - 1; } What does lob == 0, size == 0 mean? It will create a total range. +-------------------------------------------------------+ |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| +-------------------------------------------------------+ range->lob == 0 range->upb == UINT64_MAX What does lob == 1, size == 0 mean? It will create an empty range. Because I didn't want to change the code, that makes the proof very long, and it demonstrates another feature of ACSL specifications ("behaviors"): /*@ requires \valid(range); requires lob + (integer)size <= UINT64_MAX; assigns *range; behavior empty: assumes lob > 0 && size == 0; ensures empty_range(*range); behavior non_empty: assumes size > 0; ensures range->lob == lob; ensures size_of_bounds(lob, range->upb) == size; behavior total: assumes lob == 0 && size == 0; ensures total_range(*range); complete behaviors; disjoint behaviors; */ I also had to modify my earlier empty_range predicate: /*@ // before: predicate empty_range(struct Range range) = range.lob == range.upb + 1; */ /*@ // after: predicate empty_range(struct Range range) = range.lob > 0 && range.lob == range.upb + 1; predicate total_range(struct Range range) = range.lob == 0 && range.upb == UINT64_MAX; */ What can we say about this? If you were developing the Range type and the proof at the same time, you probably wouldn't define the Range type in this way, and you probably wouldn't choose an initialization function which had this complex set of behaviours. I did prove most of the rest of the Range functions, but I wanted to point out a few things about the final proof: - The original file includes glib headers, but those cannot be parsed by Frama-C's parser. The problem appears to be some unsupported __attribute__s. Some attributes seem to work, and others are not supported, and this of course limits what you can prove if it depends on any external library that might use these. - For the same reason I had to exclude a glib function for operating on lists of ranges from the proof. - You can run frama-c on the original source file, with a few #ifdef modifications, so it would be possible to maintain the annotations upstream, and run the proof checker as a CI test. - It probably doesn't make sense for qemu right now though: * concerted effort to prove more substantial pieces of code * would need upstream effort to modularise qemu * co-develop modules and proofs = POWER OF 2 = This is a function from nbdkit: /* Returns true if v is a power of 2. * * Uses the algorithm described at * http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 */ static inline bool is_power_of_2 (unsigned long v) { return v && ((v & (v - 1)) == 0); } and my initial specification started by defining: /*@ predicate positive_power_of_2 (integer i) = i > 0 && (i == 1 || ((i & 1) == 0 && positive_power_of_2 (i >> 1))); lemma positive_power_of_2 (1); lemma positive_power_of_2 (2); lemma positive_power_of_2 (4); lemma !positive_power_of_2 (7); */ I was using the lemmas (which are like assertions, except statically verified), to test my predicate, but sadly Frama-C was not able to prove them even though they appear to be trivial. This led to some discussion on stackoverflow: https://stackoverflow.com/questions/64268418/how-do-i-write-an-is-power-of-2-predicate-in-acsl Ultimately it was possible to fix this, but I discovered a few things about Alt-Ergo (the default prover): - It can't handle bitwise operations very well. - You can guide Alt-Ergo, but it requires using the Frama-C GUI which is very non-intuitive. You do end up with a script but I couldn't work out how to integrate this with the command line. - Z3 should be able to solve this, but I was not able to get Z3 working with Frama-C on Fedora. Essentially bitwise tricks like this are a hard case for automated theorem proving. I gave up. = TIMEVAL DIFFERENCE = This is another nbdkit function: /* Return the number of µs (microseconds) in y - x. */ static inline int64_t tvdiff_usec (const struct timeval *x, const struct timeval *y) { int64_t usec; usec = (y->tv_sec - x->tv_sec) * 1000000; usec += y->tv_usec - x->tv_usec; return usec; } My first attempt at a proof was: /*@ predicate valid_timeval (struct timeval tv) = tv.tv_sec >= 0 && tv.tv_usec >= 0 && tv.tv_usec < 1000000; logic integer tv_to_microseconds (struct timeval tv) = tv.tv_sec * 1000000 + tv.tv_usec; */ /* Return the number of µs (microseconds) in y - x. */ /*@ requires \valid_read (x) && \valid_read (y); requires valid_timeval (*x) && valid_timeval (*y); ensures \result == tv_to_microseconds (*y) - tv_to_microseconds (*x); */ static inline int64_t tvdiff_usec (const struct timeval *x, const struct timeval *y) { int64_t usec; usec = (y->tv_sec - x->tv_sec) * 1000000; usec += y->tv_usec - x->tv_usec; return usec; } This is not provable, and again the problem is integer overflow. The tv_sec field is a 64 bit integer so it's quite easy to construct two inputs which overflow the output. Fixing this involves a considerably more complex specification: /*@ requires \valid_read (x) && \valid_read (y); requires valid_timeval (*x) && valid_timeval (*y); requires \valid (r); assigns *r; behavior success: ensures \result == 0; ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x); INT64_MIN <= diff <= INT64_MAX && *r == diff; behavior failure: ensures \result == -1; ensures *r == 0; ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x); !(INT64_MIN <= diff <= INT64_MAX); complete behaviors; disjoint behaviors; */ static inline int tvdiff_usec (const struct timeval *x, const struct timeval *y, int64_t *r) ... and of course that is not sufficient because I also had to fix the code so it can return an overflow indication. And actually writing that code is left as an exercise for the reader as it is not at all simple. A few things I learned from this: - This is a function we have had and used for years, and I don't think anyone ever thought it had a problem. - Co-developing the specification alongside the code could help here. - You can end up with functions that are much harder to use. If we were proof-checking the whole program we could check all callers. - GCC has __builtin_add_overflow and similar which can be used to do mathematical operations and detect overflow, but Frama-C does not understand them. We could contribute these to the Frama-C standard library. = STRING FUNCTIONS = Uli sent me this function from glibc: [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c] To be honest I had problems even fully understanding the semantics of this function, let alone trying to translate that into ACSL. So I thought I'd look at a simpler function: [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c] I want to point out first that this "strlen" is a fallback which is only used if you don't have an optimized function for your hardware, and for the vast majority of users they'll be using something like an AVX-optimized strlen, not this one. This function works in two halves, the first part iterates over single characters until we reach an aligned boundary. And then the second part looks at whole words and uses a clever trick to determine if any byte in the word is zero. In theory a simple specification which would apply to any strlen-style function would be: /*@ requires valid_read_string (str); assigns \nothing; ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') && str[\result] == '\0'; */ size_t strlen (const char *str) { ... It would be possible to use this to prove a simple strlen implementation that just worked character by character. Unfortunately for the actual glibc function we hit another limitation of the WP plugin, called the Typed Memory Model. WP models memory as a set of cells, with a type attached to each cell. So a string for example is a set of char cells, each cell being typed as char. You cannot cast these cells to something like a word and try to do a proof over that. Simply a limitation which is well documented. So a proof of the glibc function eludes me. There is a set of open source licensed string functions with Frama-C proofs available: https://github.com/evdenis/verker and this is what the strlen function with proof looks like from that: [https://github.com/evdenis/verker/blob/master/src/strlen.c] Now you might be asking what happens when you write a function that uses strlen, for example this trivial function with a working specification: $ cat snippets/last_char.c $ frama-c -wp -wp-rte snippets/last_char.c The questions here are: - Is this proving the glibc strlen function? - Are we calling glibc strlen() from the specification? $ less /usr/share/frama-c/libc/string.h $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h And the answers are no and no. In fact what's happening here is we are using Frama-C's own "string.h" header. This header defines and contains a specification for how strlen. But this is also not a proof of strlen: in fact a second file defines what's known as an "axiomatic definition" or theory of how strlen works. Essentially what's happening is we're assuming that strlen in your standard library (which might not even be glibc) works. To make a complete proof you'd also need to additionally prove all libraries you depend on. Another little fun thing is Frama-C's strlen function can return -1, which is used to represent an unbounded string. Of course unbounded strings cannot exist on real computers, but they can exist on theoretical ones! = IN CONCLUSION = * Frama-C is a real open source tool used by companies to verify safety-critical software. If you've flown in an Airbus or used electricity from a French nuclear power plant then you've used this software already. * Not as hard to use as I imagined. I spent about 6-12 hours from zero to being able to fully prove a whole file of production code. * Mostly useful for small functions. There are other tools, like TLA+ for proving distributed algorithms. * Forces you to think very hard about assumptions and corner cases, but that's the nature of the problem. * Very well documented, questions answered quickly on stackoverflow. But it has its limits: * Typed Memory Model turns out to be restrictive. * No support for malloc. * Not good at bitwise ops (can Z3 help?). * Cannot parse some non-standard __attribute__s (glib). * Lacks some __builtin_* functions. Alternatives and related programs: * Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally verified subset of the Ada programming language (tagline: "You simply can't write better code"). * KLEE (http://klee.github.io/) has similarities but is not a formal proof system. * CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler (not open source). Resources: * Tutorial that I followed in October which I thought was a good introduction: https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html * Frama-C on Stack Overflow: https://stackoverflow.com/questions/tagged/frama-c * Allan Blanchard's tutorial: https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf * David Mentre's introduction: https://www.cs.umd.edu/class/spring2016/cmsc838G/frama-c/introduction-to-frama-c_v2.pdf * Upstream documentation: Manual: https://frama-c.com/download/frama-c-user-manual.pdf WP manual: https://frama-c.com/download/frama-c-wp-manual.pdf ACSL: https://frama-c.com/acsl.html * ACSL specifications for various string functions: https://github.com/evdenis/verker