2020-frama-c/notes.txt

   1 16th Nov 2020
   2 Formally proving tiny bits of qemu using Frama-C
   3
   4 [in medias res]
   5
   6 In early October I was talking to one of the developers of Frama-C,
   7 which is a modular framework for verifying C programs.  It's been on
   8 my radar to learn for a long time, and I'm a co-maintainer of the
   9 package in Fedora.  I tried to follow one tutorial but never really
  10 got past installation and UI issues.  But the developer pointed me to
  11 a very good, short set of blog postings about using the command line
  12 tool.
  13
  14 I will provide links to tutorials etc at the end.
  15
  16 I decided to spend a day or two last month seeing if I could formally
  17 prove some code inside qemu, and I arbitrarily picked one of the
  18 smallest pieces of code in the "util/" subdirectory:
  19
  20   $ wc -l util/*.c | sort -nr | tail -20
  21
  22 Actually there are two files:
  23
  24   $ ls -l util/range.c include/qemu/range.h
  25
  26 This is what the Range type from the header file looks like.  This is
  27 the complete definition upstream, with comments which are supposed to
  28 explain how it works:
  29
  30   /*
  31    * Operations on 64 bit address ranges.
  32    * Notes:
  33    * - Ranges must not wrap around 0, but can include UINT64_MAX.
  34    */
  35   struct Range {
  36       /*
  37        * Do not access members directly, use the functions!
  38        * A non-empty range has @lob <= @upb.
  39        * An empty range has @lob == @upb + 1.
  40        */
  41       uint64_t lob;        /* inclusive lower bound */
  42       uint64_t upb;        /* inclusive upper bound */
  43   };
  44   typedef struct Range Range;
  45
  46   +-----------+-----------+------------- - - - - -
  47   |           |           |
  48   +-----------+-----------+------------- - - - - -
  49                ^         ^
  50               lob ----- upb
  51
  52 Using the tutorial as my guide I wrote some simple predicates, a
  53 "predicate" being a statement which can be true or false:
  54
  55        * A non-empty range has @lob <= @upb.
  56        * An empty range has @lob == @upb + 1.
  57
  58   /*@
  59     predicate valid_range(struct Range range) =
  60       range.lob <= range.upb + 1;
  61
  62     predicate empty_range(struct Range range) =
  63       range.lob == range.upb + 1;
  64   */
  65
  66 Notice a few things here:
  67
  68  - These are statements written in a formal language.
  69
  70  - I'm using structs directly from the C code.
  71
  72  - The comments in the upstream code translate into predicates.
  73
  74 The first upstream function is:
  75
  76   /* Is range empty? */
  77   static inline bool range_is_empty(const Range *range)
  78   {
  79       return range->lob > range->upb;
  80   }
  81
  82 and using the predicates we can write a specification:
  83
  84   $ less snippets/range_is_empty.c
  85
  86 And we can compile and prove that:
  87
  88   $ frama-c -wp -wp-rte snippets/range_is_empty.c
  89
  90 Frama-C parsed the C code and the formal specification and machine
  91 checked it, and it's correct - the code is bug-free.
  92
  93 = OVERVIEW OF FRAMA-C ECOSYSTEM =
  94
  95
  96 XXX Modular Framework for analysis of C
  97
  98 XXX Take some slides from David Mentre's presentation.
  99
 100 XXX Explain which companies are using Frama-C.
 101
 102 XXX WP plugin
 103
 104 XXX ACSL language
 105
 106
 107
 108 = BACK TO RANGE.C =
 109
 110 Going back to what we proved so far:
 111
 112   /*@
 113     predicate valid_range(struct Range range) =
 114       range.lob <= range.upb + 1;
 115     predicate empty_range(struct Range range) =
 116       range.lob == range.upb + 1;
 117   */
 118
 119   /* Is range empty? */
 120   /*@
 121     requires \valid_read(range);
 122     requires valid_range(*range);
 123     assigns \nothing;
 124     ensures \result <==> empty_range(*range);
 125    */
 126   static inline bool range_is_empty(const Range *range)
 127   {
 128       return range->lob > range->upb;
 129   }
 130
 131   $ frama-c -wp -wp-rte snippets/range_is_empty.c
 132
 133 The "@"-comments are ACSL.  The C code is parsed by Frama-C into an
 134 abstract syntax tree.  We told Frama-C to apply the WP plugin.  The
 135 specification and C code were translated into first-order logic
 136 statements and passed to the Alt-Ergo SMT prover to see if it could
 137 find a Proof Obligation (PO).
 138
 139 We need to be very clear about what happened:
 140
 141  - The preconditions must be fulfilled in order for the proof to be
 142    correct.  If you are checking a whole program then that would check
 143    that all callers of this function are fulfilling the preconditions,
 144    but if you check only single functions then you are not doing that.
 145
 146  - The proof only shows that the post-condition holds.
 147
 148  - This only shows that the source code is correct.  The compiler
 149    could still compile the program incorrectly.  If you really want to
 150    get into this you have to use a verified C compiler like CompCert,
 151    and there are even verified CPU implementations.
 152
 153  - Given those assumptions, the code is bug free - you don't need to
 154    write any tests.
 155
 156 Obviously this is a single line, very trivial function, but I was
 157 quite pleased that I was able to prove it quickly.  I kept going on
 158 the range file.  The next function is:
 159
 160   /* Does range contain val? */
 161   static inline bool range_contains(const Range *range, uint64_t val)
 162   {
 163       return val >= range->lob && val <= range->upb;
 164   }
 165
 166 This is similarly easy to prove after adding another predicate at the
 167 top of the file:
 168
 169   /*@
 170     predicate value_in_range(struct Range range, uint64_t value) =
 171       range.lob <= value <= range.upb;
 172   */
 173
 174   /* Does range contain val? */
 175   /*@
 176     requires \valid_read(range);
 177     requires valid_range(*range);
 178     assigns \nothing;
 179     ensures \result <==> value_in_range(*range, val);
 180    */
 181   static inline bool range_contains(const Range *range, uint64_t val)
 182   {
 183       return val >= range->lob && val <= range->upb;
 184   }
 185
 186 The next function is range_make_empty, again easy to prove using the
 187 already existing empty_range predicate.  Notice how we declare which
 188 memory locations this function assigns to:
 189
 190   /* Initialize range to the empty range */
 191   /*@
 192     requires \valid(range);
 193     requires valid_range(*range);
 194     assigns *range;
 195     ensures empty_range(*range);
 196    */
 197   static inline void range_make_empty(Range *range)
 198   {
 199       *range = range_empty;
 200       assert(range_is_empty(range));
 201   }
 202
 203 I'm going to skip forward a few functions to get to an interesting one.
 204 This seems trivial:
 205
 206   /* Get the size of range. */
 207   static inline uint64_t range_size(const Range *range)
 208   {
 209       return range->upb - range->lob + 1;
 210   }
 211
 212 My first attempt at a proof was:
 213
 214   /*@
 215     logic integer size_of_bounds(integer lob, integer upb) =
 216       upb - lob + 1;
 217
 218     logic integer size_of_range(struct Range range) =
 219       size_of_bounds(range.lob, range.upb);
 220   */
 221
 222   /* Get the size of range. */
 223   /*@
 224     requires \valid(range);
 225     requires valid_range(*range);
 226     assigns \nothing;
 227     ensures \result == size_of_range(*range);
 228   */
 229   static inline uint64_t range_size(const Range *range)
 230   {
 231       return range->upb - range->lob + 1;
 232   }
 233
 234 A few things to notice about this:
 235
 236  - I'm using ACSL "logic" statements, which are similar to functions.
 237
 238  - ACSL has an "integer" type which is an unbounded integer.
 239
 240 Let's try to prove this one:
 241
 242   $ frama-c -wp -wp-rte snippets/range_size.c
 243   It fails!
 244
 245 This was confusing to me because at this point I'd probably spent an
 246 hour, and I'd proven about 5 functions successfully, and this function
 247 looks equally trivial and yet it cannot be proven.  Why?
 248
 249 One way to find the problem would be to find a COUNTEREXAMPLE.  A
 250 counterexample is an instance of an input that satisfies all of the
 251 preconditions, but makes the postcondition false.  Now Frama-C has
 252 pluggable provers, and one prover called Z3, originally written by
 253 Microsoft, can be used with Frama-C and can sometimes find
 254 counterexamples.  However for some reason the version of Z3 in Fedora
 255 does not like to work with the version of Frama-C in Fedora and I
 256 don't know why.
 257
 258 So this is the counterexample which I worked out myself:
 259
 260   /*@
 261     logic integer size_of_bounds(integer lob, integer upb) =
 262       upb - lob + 1;
 263
 264     logic integer size_of_range(struct Range range) =
 265       size_of_bounds(range.lob, range.upb);
 266   */
 267
 268   /* Get the size of range. */
 269   /*@
 270     requires \valid(range);
 271     requires valid_range(*range);
 272     assigns \nothing;
 273     ensures \result == size_of_range(*range);
 274   */
 275   static inline uint64_t range_size(const Range *range)
 276   {
 277       return range->upb - range->lob + 1;
 278   }
 279
 280   +-------------------------------------------------------+
 281   |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
 282   +-------------------------------------------------------+
 283   range->lob == 0                 range->upb == UINT64_MAX
 284
 285 In this case size_of_range() returns an integer.  Recall that integers
 286 are an unbounded type.  So it returns UINT64_MAX + 1 (a 65 bit
 287 number).  Our C function on the other hand returns 0.
 288
 289 This is a bug in the function.  There are really two ways to
 290 look at this:
 291
 292 (1) The function shouldn't be called on empty ranges (size 0), and so
 293 if the function returns 0 it means "the total range".  Or:
 294
 295 (2) The function should have a different signature so it can return a
 296 separate indication for "empty range" vs "total range".
 297
 298 I went with the first, so the final working specification is this.  If
 299 we were proving the entire program then we could statically verify
 300 that callers do not try to call range_size() on the empty range.
 301
 302   /*@
 303     requires \valid(range);
 304     requires valid_range(*range);
 305     requires !empty_range(*range);
 306     assigns \nothing;
 307                         // case UINT64_MAX+1 -> 0
 308     ensures \result == (uint64_t)size_of_range(*range);
 309    */
 310   static inline uint64_t range_size(const Range *range)
 311   {
 312       return range->upb - range->lob + 1;
 313   }
 314
 315   $ frama-c -wp -wp-rte snippets/range_size-good.c
 316
 317 On to the next function.  Again this seems very simple, but in fact it
 318 contains a serious problem:
 319
 320   /*
 321    * Initialize range to span the interval [lob,lob + size - 1].
 322    * size may be 0. Range must not overflow.
 323    */
 324   static inline void range_init_nofail(Range *range, uint64_t lob,
 325                                        uint64_t size)
 326   {
 327       range->lob = lob;
 328       range->upb = lob + size - 1;
 329   }
 330
 331 What does lob == 0, size == 0 mean?  It will create a total range.
 332
 333   +-------------------------------------------------------+
 334   |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
 335   +-------------------------------------------------------+
 336   range->lob == 0                 range->upb == UINT64_MAX
 337
 338 What does lob == 1, size == 0 mean?  It will create an empty range.
 339
 340 Because I didn't want to change the code, that makes the proof very
 341 long, and it demonstrates another feature of ACSL specifications
 342 ("behaviors"):
 343
 344   /*@
 345     requires \valid(range);
 346     requires lob + (integer)size <= UINT64_MAX;
 347     assigns *range;
 348     behavior empty:
 349       assumes lob > 0 && size == 0;
 350       ensures empty_range(*range);
 351     behavior non_empty:
 352       assumes size > 0;
 353       ensures range->lob == lob;
 354       ensures size_of_bounds(lob, range->upb) == size;
 355     behavior total:
 356       assumes lob == 0 && size == 0;
 357       ensures total_range(*range);
 358     complete behaviors;
 359     disjoint behaviors;
 360    */
 361
 362 I also had to modify my earlier empty_range predicate:
 363
 364   /*@ // before:
 365     predicate empty_range(struct Range range) =
 366       range.lob == range.upb + 1;
 367   */
 368
 369   /*@ // after:
 370     predicate empty_range(struct Range range) =
 371       range.lob > 0 && range.lob == range.upb + 1;
 372
 373     predicate total_range(struct Range range) =
 374       range.lob == 0 && range.upb == UINT64_MAX;
 375   */
 376
 377 What can we say about this?  If you were developing the Range type and
 378 the proof at the same time, you probably wouldn't define the Range
 379 type in this way, and you probably wouldn't choose an initialization
 380 function which had this complex set of behaviours.
 381
 382 I did prove most of the rest of the Range functions, but I wanted to
 383 point out a few things about the final proof:
 384
 385  - The original file includes glib headers, but those cannot be parsed
 386    by Frama-C's parser.  The problem appears to be some unsupported
 387    __attribute__s.  Some attributes seem to work, and others are not
 388    supported, and this of course limits what you can prove if it
 389    depends on any external library that might use these.
 390
 391  - For the same reason I had to exclude a glib function for operating
 392    on lists of ranges from the proof.
 393
 394  - You can run frama-c on the original source file, with a few #ifdef
 395    modifications, so it would be possible to maintain the annotations
 396    upstream, and run the proof checker as a CI test.
 397
 398  - It probably doesn't make sense for qemu right now though, unless we
 399    could prove more substantial pieces of code.
 400
 401 = POWER OF 2 =
 402
 403 This is a function from nbdkit:
 404
 405   /* Returns true if v is a power of 2.
 406    *
 407    * Uses the algorithm described at
 408    * http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
 409    */
 410   static inline bool
 411   is_power_of_2 (unsigned long v)
 412   {
 413     return v && ((v & (v - 1)) == 0);
 414   }
 415
 416 and my initial specification started by defining:
 417
 418   /*@
 419     predicate positive_power_of_2 (integer i) =
 420       i > 0 &&
 421       (i == 1 || ((i & 1) == 0 && positive_power_of_2 (i >> 1)));
 422
 423     lemma positive_power_of_2 (1);
 424     lemma positive_power_of_2 (2);
 425     lemma positive_power_of_2 (4);
 426     lemma !positive_power_of_2 (7);
 427    */
 428
 429 I was using the lemmas (which are like assertions, except statically
 430 verified), to test my predicate, but sadly Frama-C was not able to
 431 prove them even though they appear to be trivial.
 432
 433 This led to some discussion on stackoverflow:
 434
 435   https://stackoverflow.com/questions/64268418/how-do-i-write-an-is-power-of-2-predicate-in-acsl
 436
 437 Ultimately it was possible to fix this, but I discovered a few things
 438 about Alt-Ergo (the default prover):
 439
 440  - It can't handle bitwise operations very well.
 441
 442  - You can guide Alt-Ergo, but it requires using the Frama-C GUI which
 443    is very non-intuitive.  You do end up with a script but I couldn't
 444    work out how to integrate this with the command line.
 445
 446  - Z3 should be able to solve this, but I was not able to get
 447    Z3 working with Frama-C on Fedora.
 448
 449 Essentially bitwise tricks like this are a hard case for automated
 450 theorem proving.  I gave up.
 451
 452 = TIMEVAL DIFFERENCE =
 453
 454 This is another nbdkit function:
 455
 456   /* Return the number of µs (microseconds) in y - x. */
 457   static inline int64_t
 458   tvdiff_usec (const struct timeval *x, const struct timeval *y)
 459   {
 460     int64_t usec;
 461
 462     usec = (y->tv_sec - x->tv_sec) * 1000000;
 463     usec += y->tv_usec - x->tv_usec;
 464     return usec;
 465   }
 466
 467 My first attempt at a proof was:
 468
 469   /*@
 470     predicate valid_timeval (struct timeval tv) =
 471       tv.tv_sec >= 0 && tv.tv_usec >= 0 && tv.tv_usec < 1000000;
 472     logic integer tv_to_microseconds (struct timeval tv) =
 473       tv.tv_sec * 1000000 + tv.tv_usec;
 474    */
 475
 476   /* Return the number of µs (microseconds) in y - x. */
 477   /*@
 478     requires \valid_read (x) && \valid_read (y);
 479     requires valid_timeval (*x) && valid_timeval (*y);
 480     ensures \result == tv_to_microseconds (*y) - tv_to_microseconds (*x);
 481    */
 482   static inline int64_t
 483   tvdiff_usec (const struct timeval *x, const struct timeval *y)
 484   {
 485     int64_t usec;
 486
 487     usec = (y->tv_sec - x->tv_sec) * 1000000;
 488     usec += y->tv_usec - x->tv_usec;
 489     return usec;
 490   }
 491
 492 This is not provable, and again the problem is integer overflow.  The
 493 tv_sec field is a 64 bit integer so it's quite easy to construct two
 494 inputs which overflow the output.
 495
 496 Fixing this involves a considerably more complex specification:
 497
 498   /*@
 499     requires \valid_read (x) && \valid_read (y);
 500     requires valid_timeval (*x) && valid_timeval (*y);
 501     requires \valid (r);
 502     assigns *r;
 503     behavior success:
 504       ensures \result == 0;
 505       ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
 506               INT64_MIN <= diff <= INT64_MAX &&
 507               *r == diff;
 508     behavior failure:
 509       ensures \result == -1;
 510       ensures *r == 0;
 511       ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
 512               !(INT64_MIN <= diff <= INT64_MAX);
 513     complete behaviors;
 514     disjoint behaviors;
 515    */
 516   static inline int
 517   tvdiff_usec (const struct timeval *x, const struct timeval *y,
 518                int64_t *r)
 519   ...
 520
 521 and of course that is not sufficient because I also had to fix the
 522 code so it can return an overflow indication.  And actually writing
 523 that code is left as an exercise for the reader as it is not at all
 524 simple.
 525
 526 A few things I learned from this:
 527
 528  - This is a function we have had and used for years, and I don't
 529    think anyone ever thought it had a problem.
 530
 531  - Co-developing the specification alongside the code could help
 532    here.
 533
 534  - You can end up with functions that are much harder to use.  If we
 535    were proof-checking the whole program we could check all callers.
 536
 537  - GCC has __builtin_add_overflow and similar which can be used to do
 538    mathematical operations and detect overflow, but Frama-C does not
 539    understand them.  We could contribute these to the Frama-C standard
 540    library.
 541
 542 = STRING FUNCTIONS =
 543
 544 Uli sent me this function from glibc:
 545
 546   [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c]
 547
 548 To be honest I had problems even fully understanding the semantics of
 549 this function, let alone trying to translate that into ACSL.
 550
 551 So I thought I'd look at a simpler function:
 552
 553   [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c]
 554
 555 I want to point out first that this "strlen" is a fallback which is
 556 only used if you don't have an optimized function for your hardware,
 557 and for the vast majority of users they'll be using something like an
 558 AVX-optimized strlen, not this one.
 559
 560 This function works in two halves, the first part iterates over single
 561 characters until we reach an aligned boundary.  And then the second
 562 part looks at whole words and uses a clever trick to determine if any
 563 byte in the word is zero.
 564
 565 In theory a simple specification which would apply to any strlen-style
 566 function would be:
 567
 568   /*@
 569     requires valid_read_string (str);
 570     assigns \nothing;
 571     ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') &&
 572             str[\result] == '\0';
 573    */
 574   size_t strlen (const char *str)
 575   {
 576     ...
 577
 578 It would be possible to use this to prove a simple strlen
 579 implementation that just worked character by character.
 580
 581 Unfortunately for the actual glibc function we hit another limitation
 582 of the WP plugin, called the Typed Memory Model.
 583
 584 WP models memory as a set of cells, with a type attached to each cell.
 585 So a string for example is a set of char cells, each cell being typed
 586 as char.  You cannot cast these cells to something like a word and try
 587 to do a proof over that.  Simply a limitation which is well
 588 documented.
 589
 590 So a proof of the glibc function eludes me.
 591
 592 There is a set of open source licensed string functions with Frama-C
 593 proofs available:
 594
 595 https://github.com/evdenis/verker
 596
 597 and this is what the strlen function with proof looks like from that:
 598
 599   [https://github.com/evdenis/verker/blob/master/src/strlen.c
 600
 601 Now you might be asking what happens when you write a function that
 602 uses strlen, for example this trivial function with a working
 603 specification:
 604
 605   $ cat snippets/last_char.c
 606   $ frama-c -wp -wp-rte snippets/last_char.c
 607
 608 The questions here are:
 609
 610  - Is this proving the glibc strlen function?
 611
 612  - Are we calling glibc strlen() from the specification?
 613
 614   $ less /usr/share/frama-c/libc/string.h
 615   $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h
 616
 617 And the answers are no and no.  In fact what's happening here is we
 618 are using Frama-C's own "string.h" header.  This header defines and
 619 contains a specification for how strlen.  But this is also not a proof
 620 of strlen: in fact a second file defines what's known as an "axiomatic
 621 definition" or theory of how strlen works.
 622
 623 Essentially what's happening is we're assuming that strlen in your
 624 standard library (which might not even be glibc) works.  To make a
 625 complete proof you'd also need to additionally prove all libraries you
 626 depend on.
 627
 628 Another little fun thing is Frama-C's strlen function can return -1,
 629 which is used to represent an unbounded string.  Of course unbounded
 630 strings cannot exist on real computers, but they can exist on
 631 theoretical ones!
 632
 633 = IN CONCLUSION =
 634
 635 * Frama-C is a real open source tool used by companies to verify
 636   safety-critical software.  If you've flown in an Airbus or used
 637   electricity from a French nuclear power plant then you've used this
 638   software already.
 639
 640 * Not as hard to use as I imagined.  I spent about 6-12 hours from
 641   zero to being able to fully prove a whole file of production code.
 642
 643 * Mostly useful for small functions.  There are other tools, like TLA+
 644   for proving distributed algorithms.
 645
 646 * Forces you to think very hard about assumptions and corner cases,
 647   but that's the nature of the problem.
 648
 649 * Very well documented, questions answered quickly on stackoverflow.
 650
 651 But it has its limits:
 652
 653 * Typed Memory Model turns out to be restrictive.
 654
 655 * No support for malloc.
 656
 657 * Not good at bitwise ops (can Z3 help?).
 658
 659 * Cannot parse some non-standard __attribute__s (glib).
 660
 661 * Lacks some __builtin_* functions.
 662
 663 Alternatives and related programs:
 664
 665 * Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally
 666   verified subset of the Ada programming language
 667   (tagline: "You simply can't write better code").
 668
 669 * KLEE (http://klee.github.io/) has similarities but is not a formal
 670   proof system.
 671
 672 * CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler
 673   (not open source).
 674
 675 Resources:
 676
 677 * Tutorial that I followed in October which I thought was a good
 678   introduction:
 679   https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html
 680   https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html
 681   https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html
 682
 683 * Frama-C on Stack Overflow:
 684   https://stackoverflow.com/questions/tagged/frama-c
 685
 686 * Allan Blanchard's tutorial:
 687   https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf
 688
 689 * David Mentre's introduction:
 690   https://www.cs.umd.edu/class/spring2016/cmsc838G/frama-c/introduction-to-frama-c_v2.pdf
 691
 692 * Upstream documentation:
 693   Manual: https://frama-c.com/download/frama-c-user-manual.pdf
 694   WP manual: https://frama-c.com/download/frama-c-wp-manual.pdf
 695   ACSL: https://frama-c.com/acsl.html
 696
 697 * ACSL specifications for various string functions:
 698   https://github.com/evdenis/verker