Monday 16th Nov 2020
Formally proving tiny bits of qemu using Frama-C



In early October I was talking to one of the developers of Frama-C,
which is a modular framework for verifying C programs.  It's been on
my radar to learn for a long time, and I'm a co-maintainer of the
package in Fedora.  I tried to follow one tutorial but never really
got past installation and UI issues.  But the developer pointed me to
a very good, short set of blog postings about using the command line
tool.

I will provide links to tutorials etc at the end.


= OVERVIEW OF FRAMA-C ECOSYSTEM =

The name stands for "Framework for Static Analysis of the C language".

It's modular, with a core program that reads C source code
and turns it into Abstract Syntax Trees.  And a set of plugins
to do static analysis by annotating these syntax trees.
Plugins can cooperate, so analysis can be passed between
plugins.

The following slides are taken from David Mentré‘s 2016 presentation.


I decided to spend a day or two last month seeing if I could formally
prove some code inside qemu, and I arbitrarily picked one of the
smallest pieces of code in the "util/" subdirectory:

  $ wc -l util/*.c | sort -nr | tail -20

Actually there are two files:

  $ ls -l util/range.c include/qemu/range.h

This is what the Range type from the header file looks like.  This is
the complete definition upstream, with comments which are supposed to
explain how it works:

  /*
   * Operations on 64 bit address ranges.
   * Notes:
   * - Ranges must not wrap around 0, but can include UINT64_MAX.
   */
  struct Range {
      /*
       * Do not access members directly, use the functions!
       * A non-empty range has @lob <= @upb.
       * An empty range has @lob == @upb + 1.
       */
      uint64_t lob;        /* inclusive lower bound */
      uint64_t upb;        /* inclusive upper bound */
  };
  typedef struct Range Range;

  +-----------+-----------+------------- - - - - -
  |           |           |
  +-----------+-----------+------------- - - - - -
               ^         ^
              lob ----- upb

Using the tutorial as my guide I wrote some simple predicates, a
"predicate" being a statement which can be true or false:

       * A non-empty range has @lob <= @upb.
       * An empty range has @lob == @upb + 1.

  /*@
    predicate valid_range(struct Range range) =
      range.lob <= range.upb + 1;

    predicate empty_range(struct Range range) =
      range.lob == range.upb + 1;
  */

Notice a few things here:

 - These are statements written in a formal language.

 - I'm using structs directly from the C code.

 - The upstream comments translate into machine-checkable code.

The first upstream function is:

  /* Is range empty? */
  static inline bool range_is_empty(const Range *range)
  {
      return range->lob > range->upb;
  }

and using the predicates we can write a specification:

  $ cat snippets/range_is_empty.c

And we can compile and prove that:

  $ frama-c -wp -wp-rte snippets/range_is_empty.c

Frama-C parsed the C code and the formal specification and machine
checked it, and it's correct - the code is bug-free.


Going back to what we proved so far:

  /*@
    predicate valid_range(struct Range range) =
      range.lob <= range.upb + 1;
    predicate empty_range(struct Range range) =
      range.lob == range.upb + 1;
  */

  /* Is range empty? */
  /*@
    requires \valid_read(range);
    requires valid_range(*range);
    assigns \nothing;
    ensures \result <==> empty_range(*range);
   */
  static inline bool range_is_empty(const Range *range)
  {
      return range->lob > range->upb;
  }

  $ frama-c -wp -wp-rte snippets/range_is_empty.c

The "@"-comments are ACSL.  The C code is parsed by Frama-C into an
abstract syntax tree.  We told Frama-C to apply the WP plugin.  The
specification and C code were translated into first-order logic
statements and passed to the Alt-Ergo SMT prover to see if it could
find a Proof Obligation (PO).

We need to be very clear about what happened:

 - The preconditions must be fulfilled in order for the proof to be
   correct.  If you are checking a whole program then that would check
   that all callers of this function are fulfilling the preconditions,
   but if you check only single functions then you are not doing that.

 - The proof only shows that the post-condition holds.

 - This only shows that the source code is correct.  The compiler
   could still compile the program incorrectly.  If you really want to
   get into this you have to use a verified C compiler like CompCert,
   and there are even verified CPU implementations.

 - Given those assumptions, the code is bug free - you don't need to
   write any tests.


Obviously this is a single line, very trivial function, but I was
quite pleased that I was able to prove it quickly.  I kept going on
the range file.  The next function is:

  /* Does range contain val? */
  static inline bool range_contains(const Range *range, uint64_t val)
  {
      return val >= range->lob && val <= range->upb;
  }

This is similarly easy to prove after adding another predicate at the
top of the file:

  /*@
    predicate value_in_range(struct Range range, uint64_t value) =
      range.lob <= value <= range.upb;
  */

  /* Does range contain val? */
  /*@
    requires \valid_read(range);
    requires valid_range(*range);
    assigns \nothing;
    ensures \result <==> value_in_range(*range, val);
   */
  static inline bool range_contains(const Range *range, uint64_t val)
  {
      return val >= range->lob && val <= range->upb;
  }


The next function is range_make_empty, again easy to prove using the
already existing empty_range predicate.  Notice how we declare which
memory locations this function assigns to:

  /* Initialize range to the empty range */
  /*@
    requires \valid(range);
    requires valid_range(*range);
    assigns *range;
    ensures empty_range(*range);
   */
  static inline void range_make_empty(Range *range)
  {
      *range = range_empty;
      assert(range_is_empty(range));
  }


I'm going to skip forward a few functions to get to an interesting one.
This seems trivial:

  /* Get the size of range. */
  static inline uint64_t range_size(const Range *range)
  {
      return range->upb - range->lob + 1;
  }

My first attempt at a proof was:

  /*@
    logic integer size_of_bounds(integer lob, integer upb) =
      upb - lob + 1;

    logic integer size_of_range(struct Range range) =
      size_of_bounds(range.lob, range.upb);
  */

  /* Get the size of range. */
  /*@
    requires \valid(range);
    requires valid_range(*range);
    assigns \nothing;
    ensures \result == size_of_range(*range);
  */
  static inline uint64_t range_size(const Range *range)
  {
      return range->upb - range->lob + 1;
  }

A few things to notice about this:

 - I'm using ACSL "logic" statements, which are similar to functions.

 - ACSL has an "integer" type which is an unbounded integer.

Let's try to prove this one:

  $ frama-c -wp -wp-rte snippets/range_size.c
  It fails!

This was confusing to me because at this point I'd probably spent an
hour, and I'd proven about 5 functions successfully, and this function
looks equally trivial and yet it cannot be proven.  Why?

One way to find the problem would be to find a COUNTEREXAMPLE.  A
counterexample is an instance of an input that satisfies all of the
preconditions, but makes the postcondition false.  Frama-C has
pluggable provers, and one prover called Z3, originally written by
Microsoft, can be used with Frama-C and can sometimes find
counterexamples.

 $ frama-c -wp -wp-rte snippets/range_size.c -wp-prover alt-ergo,why3:z3-ce

Unfortunately Z3 cannot find a counterexample in this case.  I even
upped the timeout to run Z3 longer but it still couldn't find one.

[Z3 model_compact issue: https://git.frama-c.com/pub/frama-c/-/issues/33]

So this is the counterexample which I worked out myself:

  /*@
    logic integer size_of_bounds(integer lob, integer upb) =
      upb - lob + 1;

    logic integer size_of_range(struct Range range) =
      size_of_bounds(range.lob, range.upb);
  */

  /* Get the size of range. */
  /*@
    requires \valid(range);
    requires valid_range(*range);
    assigns \nothing;
    ensures \result == size_of_range(*range);
  */
  static inline uint64_t range_size(const Range *range)
  {
      return range->upb - range->lob + 1;
  }

  +-------------------------------------------------------+
  |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
  +-------------------------------------------------------+
  range->lob == 0                 range->upb == UINT64_MAX

In this case size_of_range() returns an integer.  Recall that integers
are an unbounded type.  So it returns UINT64_MAX + 1 (a 65 bit
number).  Our C function on the other hand returns 0.

This is a bug in the function.  There are really two ways to
look at this:

(1) The function shouldn't be called on empty ranges (size 0), and so
if the function returns 0 it means "the total range".  Or:

(2) The function should have a different signature so it can return a
separate indication for "empty range" vs "total range".

I went with the first, so the final working specification is this.  If
we were proving the entire program then we could statically verify
that callers do not try to call range_size() on the empty range.

  /*@
    requires \valid(range);
    requires valid_range(*range);
    requires !empty_range(*range);
    assigns \nothing;
                        // case UINT64_MAX+1 -> 0
    ensures \result == (uint64_t)size_of_range(*range);
   */
  static inline uint64_t range_size(const Range *range)
  {
      return range->upb - range->lob + 1;
  }

  $ frama-c -wp -wp-rte snippets/range_size-good.c


On to the next function.  Again this seems very simple, but in fact it
contains a serious problem:

  /*
   * Initialize range to span the interval [lob,lob + size - 1].
   * size may be 0. Range must not overflow.
   */
  static inline void range_init_nofail(Range *range, uint64_t lob,
                                       uint64_t size)
  {
      range->lob = lob;
      range->upb = lob + size - 1;
  }

What does lob == 0, size == 0 mean?  It will create a total range.

  +-------------------------------------------------------+
  |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
  +-------------------------------------------------------+
  range->lob == 0                 range->upb == UINT64_MAX

What does lob == 1, size == 0 mean?  It will create an empty range.

Because I didn't want to change the code, that makes the proof very
long, and it demonstrates another feature of ACSL specifications
("behaviors"):

  /*@
    requires \valid(range);
    requires lob + (integer)size <= UINT64_MAX;
    assigns *range;
    behavior empty:
      assumes lob > 0 && size == 0;
      ensures empty_range(*range);
    behavior non_empty:
      assumes size > 0;
      ensures range->lob == lob;
      ensures size_of_bounds(lob, range->upb) == size;
    behavior total:
      assumes lob == 0 && size == 0;
      ensures total_range(*range);
    complete behaviors;
    disjoint behaviors;
   */

I also had to modify my earlier empty_range predicate:

  /*@ // before:
    predicate empty_range(struct Range range) =
      range.lob == range.upb + 1;
  */

  /*@ // after:
    predicate empty_range(struct Range range) =
      range.lob > 0 && range.lob == range.upb + 1;

    predicate total_range(struct Range range) =
      range.lob == 0 && range.upb == UINT64_MAX;
  */

What can we say about this?  If you were developing the Range type and
the proof at the same time, you probably wouldn't define the Range
type in this way, and you probably wouldn't choose an initialization
function which had this complex set of behaviours.

I did prove most of the rest of the Range functions, but I wanted to
point out a few things about the final proof:

 - The original file includes glib headers, but those cannot be parsed
   by Frama-C's parser.  The problem appears to be some unsupported
   __attribute__s.  Some attributes seem to work, and others are not
   supported, and this of course limits what you can prove if it
   depends on any external library that might use these.

 - For the same reason I had to exclude a glib function for operating
   on lists of ranges from the proof.

 - You can run frama-c on the original source file, with a few #ifdef
   modifications, so it would be possible to maintain the annotations
   upstream, and run the proof checker as a CI test.

 - It probably doesn't make sense for qemu right now though:
   * concerted effort to prove more substantial pieces of code
   * would need upstream effort to modularise qemu
   * co-develop modules and proofs


= POWER OF 2 =

This is a function from nbdkit:

  /* Returns true if v is a power of 2.
   *
   * Uses the algorithm described at
   * http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
   */
  static inline bool
  is_power_of_2 (unsigned long v)
  {
    return v && ((v & (v - 1)) == 0);
  }

and my initial specification started by defining:

  /*@
    predicate positive_power_of_2 (integer i) =
      i > 0 &&
      (i == 1 || ((i & 1) == 0 && positive_power_of_2 (i >> 1)));

    lemma positive_power_of_2 (1);
    lemma positive_power_of_2 (2);
    lemma positive_power_of_2 (4);
    lemma !positive_power_of_2 (7);
   */

I was using the lemmas (which are like assertions, except statically
verified), to test my predicate, but sadly Frama-C was not able to
prove them even though they appear to be trivial.

This led to some discussion on stackoverflow:

  https://stackoverflow.com/questions/64268418/how-do-i-write-an-is-power-of-2-predicate-in-acsl

Ultimately it was possible to fix this, but I discovered a few things
about Alt-Ergo (the default prover):

 - It can't handle bitwise operations very well.

 - You can guide Alt-Ergo, but it requires using the Frama-C GUI which
   is very non-intuitive.  You do end up with a script but I couldn't
   work out how to integrate this with the command line.

 - Z3 should be able to solve this, but I was not able to get
   Z3 working with Frama-C on Fedora.

Essentially bitwise tricks like this are a hard case for automated
theorem proving.  I gave up.


= TIMEVAL DIFFERENCE =

This is another nbdkit function:

  /* Return the number of µs (microseconds) in y - x. */
  static inline int64_t
  tvdiff_usec (const struct timeval *x, const struct timeval *y)
  {
    int64_t usec;

    usec = (y->tv_sec - x->tv_sec) * 1000000;
    usec += y->tv_usec - x->tv_usec;
    return usec;
  }

My first attempt at a proof was:

  /*@
    predicate valid_timeval (struct timeval tv) =
      tv.tv_sec >= 0 && tv.tv_usec >= 0 && tv.tv_usec < 1000000;
    logic integer tv_to_microseconds (struct timeval tv) =
      tv.tv_sec * 1000000 + tv.tv_usec;
   */

  /* Return the number of µs (microseconds) in y - x. */
  /*@
    requires \valid_read (x) && \valid_read (y);
    requires valid_timeval (*x) && valid_timeval (*y);
    ensures \result == tv_to_microseconds (*y) - tv_to_microseconds (*x);
   */
  static inline int64_t
  tvdiff_usec (const struct timeval *x, const struct timeval *y)
  {
    int64_t usec;

    usec = (y->tv_sec - x->tv_sec) * 1000000;
    usec += y->tv_usec - x->tv_usec;
    return usec;
  }

This is not provable, and again the problem is integer overflow.  The
tv_sec field is a 64 bit integer so it's quite easy to construct two
inputs which overflow the output.

Fixing this involves a considerably more complex specification:

  /*@
    requires \valid_read (x) && \valid_read (y);
    requires valid_timeval (*x) && valid_timeval (*y);
    requires \valid (r);
    assigns *r;
    behavior success:
      ensures \result == 0;
      ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
              INT64_MIN <= diff <= INT64_MAX &&
              *r == diff;
    behavior failure:
      ensures \result == -1;
      ensures *r == 0;
      ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
              !(INT64_MIN <= diff <= INT64_MAX);
    complete behaviors;
    disjoint behaviors;
   */
  static inline int
  tvdiff_usec (const struct timeval *x, const struct timeval *y,
               int64_t *r)
  ...

and of course that is not sufficient because I also had to fix the
code so it can return an overflow indication.  And actually writing
that code is left as an exercise for the reader as it is not at all
simple.

A few things I learned from this:

 - This is a function we have had and used for years, and I don't
   think anyone ever thought it had a problem.

 - Co-developing the specification alongside the code could help
   here.

 - You can end up with functions that are much harder to use.  If we
   were proof-checking the whole program we could check all callers.

 - GCC has __builtin_add_overflow and similar which can be used to do
   mathematical operations and detect overflow, but Frama-C does not
   understand them.  We could contribute these to the Frama-C standard
   library.


= STRING FUNCTIONS =

Uli sent me this function from glibc:

  [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c]

To be honest I had problems even fully understanding the semantics of
this function, let alone trying to translate that into ACSL.

So I thought I'd look at a simpler function:

  [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c]

I want to point out first that this "strlen" is a fallback which is
only used if you don't have an optimized function for your hardware,
and for the vast majority of users they'll be using something like an
AVX-optimized strlen, not this one.

This function works in two halves, the first part iterates over single
characters until we reach an aligned boundary.  And then the second
part looks at whole words and uses a clever trick to determine if any
byte in the word is zero.

In theory a simple specification which would apply to any strlen-style
function would be:

  /*@
    requires valid_read_string (str);
    assigns \nothing;
    ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') &&
            str[\result] == '\0';
   */
  size_t strlen (const char *str)
  {
    ...

It would be possible to use this to prove a simple strlen
implementation that just worked character by character.

Unfortunately for the actual glibc function we hit another limitation
of the WP plugin, called the Typed Memory Model.

WP models memory as a set of cells, with a type attached to each cell.
So a string for example is a set of char cells, each cell being typed
as char.  You cannot cast these cells to something like a word and try
to do a proof over that.  Simply a limitation which is well
documented.

So a proof of the glibc function eludes me.


There is a set of open source licensed string functions with Frama-C
proofs available:

https://github.com/evdenis/verker

and this is what the strlen function with proof looks like from that:

  [https://github.com/evdenis/verker/blob/master/src/strlen.c]


Now you might be asking what happens when you write a function that
uses strlen, for example this trivial function with a working
specification:

  $ cat snippets/last_char.c
  $ frama-c -wp -wp-rte snippets/last_char.c

The questions here are:

 - Is this proving the glibc strlen function?

 - Are we calling glibc strlen() from the specification?

  $ less /usr/share/frama-c/libc/string.h
  $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h

And the answers are no and no.  In fact what's happening here is we
are using Frama-C's own "string.h" header.  This header defines and
contains a specification for how strlen.  But this is also not a proof
of strlen: in fact a second file defines what's known as an "axiomatic
definition" or theory of how strlen works.

Essentially what's happening is we're assuming that strlen in your
standard library (which might not even be glibc) works.  To make a
complete proof you'd also need to additionally prove all libraries you
depend on.

Another little fun thing is Frama-C's strlen function can return -1,
which is used to represent an unbounded string.  Of course unbounded
strings cannot exist on real computers, but they can exist on
theoretical ones!


= IN CONCLUSION =

* Frama-C is a real open source tool used by companies to verify
  safety-critical software.  If you've flown in an Airbus or used
  electricity from a French nuclear power plant then you've used this
  software already.

* Not as hard to use as I imagined.  I spent about 6-12 hours from
  zero to being able to fully prove a whole file of production code.

* Mostly useful for small functions.  There are other tools, like TLA+
  for proving distributed algorithms.

* Forces you to think very hard about assumptions and corner cases,
  but that's the nature of the problem.

* Very well documented, questions answered quickly on stackoverflow.

But it has its limits:

* Typed Memory Model turns out to be restrictive.

* No support for malloc.

* Not good at bitwise ops (can Z3 help?).

* Cannot parse some non-standard __attribute__s (glib).

* Lacks some __builtin_* functions.

Alternatives and related programs:

* Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally
  verified subset of the Ada programming language
  (tagline: "You simply can't write better code").

* KLEE (http://klee.github.io/) has similarities but is not a formal
  proof system.

* CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler
  (not open source).

Resources:

* Tutorial that I followed in October which I thought was a good
  introduction:
  https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html
  https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html
  https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html

* Frama-C on Stack Overflow:
  https://stackoverflow.com/questions/tagged/frama-c

* Allan Blanchard's tutorial:
  https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf

* David Mentre's introduction:
  https://www.cs.umd.edu/class/spring2016/cmsc838G/frama-c/introduction-to-frama-c_v2.pdf

* Upstream documentation:
  Manual: https://frama-c.com/download/frama-c-user-manual.pdf
  WP manual: https://frama-c.com/download/frama-c-wp-manual.pdf
  ACSL: https://frama-c.com/acsl.html

* ACSL specifications for various string functions:
  https://github.com/evdenis/verker