2020-frama-c/notes.txt

   1 16th Nov 2020
   2 Formally proving tiny bits of qemu using Frama-C
   3
   4 [in medias res]
   5
   6 In early October I was talking to one of the developers of Frama-C,
   7 which is a modular framework for verifying C programs.  It's been on
   8 my radar to learn for a long time, and I'm a co-maintainer of the
   9 package in Fedora.  I tried to follow one tutorial but never really
  10 got past installation and UI issues.  But the developer pointed me to
  11 a very good, short set of blog postings about using the command line
  12 tool.
  13
  14 I will provide links to tutorials etc at the end.
  15
  16 \f
  17 I decided to spend a day or two last month seeing if I could formally
  18 prove some code inside qemu, and I arbitrarily picked one of the
  19 smallest pieces of code in the "util/" subdirectory:
  20
  21   $ wc -l util/*.c | sort -nr | tail -20
  22
  23 Actually there are two files:
  24
  25   $ ls -l util/range.c include/qemu/range.h
  26
  27 This is what the Range type from the header file looks like.  This is
  28 the complete definition upstream, with comments which are supposed to
  29 explain how it works:
  30
  31   /*
  32    * Operations on 64 bit address ranges.
  33    * Notes:
  34    * - Ranges must not wrap around 0, but can include UINT64_MAX.
  35    */
  36   struct Range {
  37       /*
  38        * Do not access members directly, use the functions!
  39        * A non-empty range has @lob <= @upb.
  40        * An empty range has @lob == @upb + 1.
  41        */
  42       uint64_t lob;        /* inclusive lower bound */
  43       uint64_t upb;        /* inclusive upper bound */
  44   };
  45   typedef struct Range Range;
  46
  47   +-----------+-----------+------------- - - - - -
  48   |           |           |
  49   +-----------+-----------+------------- - - - - -
  50                ^         ^
  51               lob ----- upb
  52
  53 Using the tutorial as my guide I wrote some simple predicates, a
  54 "predicate" being a statement which can be true or false:
  55
  56        * A non-empty range has @lob <= @upb.
  57        * An empty range has @lob == @upb + 1.
  58
  59   /*@
  60     predicate valid_range(struct Range range) =
  61       range.lob <= range.upb + 1;
  62
  63     predicate empty_range(struct Range range) =
  64       range.lob == range.upb + 1;
  65   */
  66
  67 Notice a few things here:
  68
  69  - These are statements written in a formal language.
  70
  71  - I'm using structs directly from the C code.
  72
  73  - The comments in the upstream code translate into predicates.
  74
  75 The first upstream function is:
  76
  77   /* Is range empty? */
  78   static inline bool range_is_empty(const Range *range)
  79   {
  80       return range->lob > range->upb;
  81   }
  82
  83 and using the predicates we can write a specification:
  84
  85   $ less snippets/range_is_empty.c
  86
  87 And we can compile and prove that:
  88
  89   $ frama-c -wp -wp-rte snippets/range_is_empty.c
  90
  91 Frama-C parsed the C code and the formal specification and machine
  92 checked it, and it's correct - the code is bug-free.
  93
  94 \f
  95 = OVERVIEW OF FRAMA-C ECOSYSTEM =
  96
  97
  98 XXX Modular Framework for analysis of C
  99
 100 XXX Take some slides from David Mentre's presentation.
 101
 102 XXX Explain which companies are using Frama-C.
 103
 104 XXX WP plugin
 105
 106 XXX ACSL language
 107
 108
 109
 110 \f
 111 = BACK TO RANGE.C =
 112
 113 Going back to what we proved so far:
 114
 115   /*@
 116     predicate valid_range(struct Range range) =
 117       range.lob <= range.upb + 1;
 118     predicate empty_range(struct Range range) =
 119       range.lob == range.upb + 1;
 120   */
 121
 122   /* Is range empty? */
 123   /*@
 124     requires \valid_read(range);
 125     requires valid_range(*range);
 126     assigns \nothing;
 127     ensures \result <==> empty_range(*range);
 128    */
 129   static inline bool range_is_empty(const Range *range)
 130   {
 131       return range->lob > range->upb;
 132   }
 133
 134   $ frama-c -wp -wp-rte snippets/range_is_empty.c
 135
 136 The "@"-comments are ACSL.  The C code is parsed by Frama-C into an
 137 abstract syntax tree.  We told Frama-C to apply the WP plugin.  The
 138 specification and C code were translated into first-order logic
 139 statements and passed to the Alt-Ergo SMT prover to see if it could
 140 find a Proof Obligation (PO).
 141
 142 We need to be very clear about what happened:
 143
 144  - The preconditions must be fulfilled in order for the proof to be
 145    correct.  If you are checking a whole program then that would check
 146    that all callers of this function are fulfilling the preconditions,
 147    but if you check only single functions then you are not doing that.
 148
 149  - The proof only shows that the post-condition holds.
 150
 151  - This only shows that the source code is correct.  The compiler
 152    could still compile the program incorrectly.  If you really want to
 153    get into this you have to use a verified C compiler like CompCert,
 154    and there are even verified CPU implementations.
 155
 156  - Given those assumptions, the code is bug free - you don't need to
 157    write any tests.
 158
 159 \f
 160 Obviously this is a single line, very trivial function, but I was
 161 quite pleased that I was able to prove it quickly.  I kept going on
 162 the range file.  The next function is:
 163
 164   /* Does range contain val? */
 165   static inline bool range_contains(const Range *range, uint64_t val)
 166   {
 167       return val >= range->lob && val <= range->upb;
 168   }
 169
 170 This is similarly easy to prove after adding another predicate at the
 171 top of the file:
 172
 173   /*@
 174     predicate value_in_range(struct Range range, uint64_t value) =
 175       range.lob <= value <= range.upb;
 176   */
 177
 178   /* Does range contain val? */
 179   /*@
 180     requires \valid_read(range);
 181     requires valid_range(*range);
 182     assigns \nothing;
 183     ensures \result <==> value_in_range(*range, val);
 184    */
 185   static inline bool range_contains(const Range *range, uint64_t val)
 186   {
 187       return val >= range->lob && val <= range->upb;
 188   }
 189
 190 \f
 191 The next function is range_make_empty, again easy to prove using the
 192 already existing empty_range predicate.  Notice how we declare which
 193 memory locations this function assigns to:
 194
 195   /* Initialize range to the empty range */
 196   /*@
 197     requires \valid(range);
 198     requires valid_range(*range);
 199     assigns *range;
 200     ensures empty_range(*range);
 201    */
 202   static inline void range_make_empty(Range *range)
 203   {
 204       *range = range_empty;
 205       assert(range_is_empty(range));
 206   }
 207
 208 \f
 209 I'm going to skip forward a few functions to get to an interesting one.
 210 This seems trivial:
 211
 212   /* Get the size of range. */
 213   static inline uint64_t range_size(const Range *range)
 214   {
 215       return range->upb - range->lob + 1;
 216   }
 217
 218 My first attempt at a proof was:
 219
 220   /*@
 221     logic integer size_of_bounds(integer lob, integer upb) =
 222       upb - lob + 1;
 223
 224     logic integer size_of_range(struct Range range) =
 225       size_of_bounds(range.lob, range.upb);
 226   */
 227
 228   /* Get the size of range. */
 229   /*@
 230     requires \valid(range);
 231     requires valid_range(*range);
 232     assigns \nothing;
 233     ensures \result == size_of_range(*range);
 234   */
 235   static inline uint64_t range_size(const Range *range)
 236   {
 237       return range->upb - range->lob + 1;
 238   }
 239
 240 A few things to notice about this:
 241
 242  - I'm using ACSL "logic" statements, which are similar to functions.
 243
 244  - ACSL has an "integer" type which is an unbounded integer.
 245
 246 Let's try to prove this one:
 247
 248   $ frama-c -wp -wp-rte snippets/range_size.c
 249   It fails!
 250
 251 This was confusing to me because at this point I'd probably spent an
 252 hour, and I'd proven about 5 functions successfully, and this function
 253 looks equally trivial and yet it cannot be proven.  Why?
 254
 255 One way to find the problem would be to find a COUNTEREXAMPLE.  A
 256 counterexample is an instance of an input that satisfies all of the
 257 preconditions, but makes the postcondition false.  Now Frama-C has
 258 pluggable provers, and one prover called Z3, originally written by
 259 Microsoft, can be used with Frama-C and can sometimes find
 260 counterexamples.  However for some reason the version of Z3 in Fedora
 261 does not like to work with the version of Frama-C in Fedora and I
 262 don't know why.
 263
 264 So this is the counterexample which I worked out myself:
 265
 266   /*@
 267     logic integer size_of_bounds(integer lob, integer upb) =
 268       upb - lob + 1;
 269
 270     logic integer size_of_range(struct Range range) =
 271       size_of_bounds(range.lob, range.upb);
 272   */
 273
 274   /* Get the size of range. */
 275   /*@
 276     requires \valid(range);
 277     requires valid_range(*range);
 278     assigns \nothing;
 279     ensures \result == size_of_range(*range);
 280   */
 281   static inline uint64_t range_size(const Range *range)
 282   {
 283       return range->upb - range->lob + 1;
 284   }
 285
 286   +-------------------------------------------------------+
 287   |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
 288   +-------------------------------------------------------+
 289   range->lob == 0                 range->upb == UINT64_MAX
 290
 291 In this case size_of_range() returns an integer.  Recall that integers
 292 are an unbounded type.  So it returns UINT64_MAX + 1 (a 65 bit
 293 number).  Our C function on the other hand returns 0.
 294
 295 This is a bug in the function.  There are really two ways to
 296 look at this:
 297
 298 (1) The function shouldn't be called on empty ranges (size 0), and so
 299 if the function returns 0 it means "the total range".  Or:
 300
 301 (2) The function should have a different signature so it can return a
 302 separate indication for "empty range" vs "total range".
 303
 304 I went with the first, so the final working specification is this.  If
 305 we were proving the entire program then we could statically verify
 306 that callers do not try to call range_size() on the empty range.
 307
 308   /*@
 309     requires \valid(range);
 310     requires valid_range(*range);
 311     requires !empty_range(*range);
 312     assigns \nothing;
 313                         // case UINT64_MAX+1 -> 0
 314     ensures \result == (uint64_t)size_of_range(*range);
 315    */
 316   static inline uint64_t range_size(const Range *range)
 317   {
 318       return range->upb - range->lob + 1;
 319   }
 320
 321   $ frama-c -wp -wp-rte snippets/range_size-good.c
 322
 323 \f
 324 On to the next function.  Again this seems very simple, but in fact it
 325 contains a serious problem:
 326
 327   /*
 328    * Initialize range to span the interval [lob,lob + size - 1].
 329    * size may be 0. Range must not overflow.
 330    */
 331   static inline void range_init_nofail(Range *range, uint64_t lob,
 332                                        uint64_t size)
 333   {
 334       range->lob = lob;
 335       range->upb = lob + size - 1;
 336   }
 337
 338 What does lob == 0, size == 0 mean?  It will create a total range.
 339
 340   +-------------------------------------------------------+
 341   |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
 342   +-------------------------------------------------------+
 343   range->lob == 0                 range->upb == UINT64_MAX
 344
 345 What does lob == 1, size == 0 mean?  It will create an empty range.
 346
 347 Because I didn't want to change the code, that makes the proof very
 348 long, and it demonstrates another feature of ACSL specifications
 349 ("behaviors"):
 350
 351   /*@
 352     requires \valid(range);
 353     requires lob + (integer)size <= UINT64_MAX;
 354     assigns *range;
 355     behavior empty:
 356       assumes lob > 0 && size == 0;
 357       ensures empty_range(*range);
 358     behavior non_empty:
 359       assumes size > 0;
 360       ensures range->lob == lob;
 361       ensures size_of_bounds(lob, range->upb) == size;
 362     behavior total:
 363       assumes lob == 0 && size == 0;
 364       ensures total_range(*range);
 365     complete behaviors;
 366     disjoint behaviors;
 367    */
 368
 369 I also had to modify my earlier empty_range predicate:
 370
 371   /*@ // before:
 372     predicate empty_range(struct Range range) =
 373       range.lob == range.upb + 1;
 374   */
 375
 376   /*@ // after:
 377     predicate empty_range(struct Range range) =
 378       range.lob > 0 && range.lob == range.upb + 1;
 379
 380     predicate total_range(struct Range range) =
 381       range.lob == 0 && range.upb == UINT64_MAX;
 382   */
 383
 384 What can we say about this?  If you were developing the Range type and
 385 the proof at the same time, you probably wouldn't define the Range
 386 type in this way, and you probably wouldn't choose an initialization
 387 function which had this complex set of behaviours.
 388
 389 I did prove most of the rest of the Range functions, but I wanted to
 390 point out a few things about the final proof:
 391
 392  - The original file includes glib headers, but those cannot be parsed
 393    by Frama-C's parser.  The problem appears to be some unsupported
 394    __attribute__s.  Some attributes seem to work, and others are not
 395    supported, and this of course limits what you can prove if it
 396    depends on any external library that might use these.
 397
 398  - For the same reason I had to exclude a glib function for operating
 399    on lists of ranges from the proof.
 400
 401  - You can run frama-c on the original source file, with a few #ifdef
 402    modifications, so it would be possible to maintain the annotations
 403    upstream, and run the proof checker as a CI test.
 404
 405  - It probably doesn't make sense for qemu right now though, unless we
 406    could prove more substantial pieces of code.
 407
 408 \f
 409 = POWER OF 2 =
 410
 411 This is a function from nbdkit:
 412
 413   /* Returns true if v is a power of 2.
 414    *
 415    * Uses the algorithm described at
 416    * http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
 417    */
 418   static inline bool
 419   is_power_of_2 (unsigned long v)
 420   {
 421     return v && ((v & (v - 1)) == 0);
 422   }
 423
 424 and my initial specification started by defining:
 425
 426   /*@
 427     predicate positive_power_of_2 (integer i) =
 428       i > 0 &&
 429       (i == 1 || ((i & 1) == 0 && positive_power_of_2 (i >> 1)));
 430
 431     lemma positive_power_of_2 (1);
 432     lemma positive_power_of_2 (2);
 433     lemma positive_power_of_2 (4);
 434     lemma !positive_power_of_2 (7);
 435    */
 436
 437 I was using the lemmas (which are like assertions, except statically
 438 verified), to test my predicate, but sadly Frama-C was not able to
 439 prove them even though they appear to be trivial.
 440
 441 This led to some discussion on stackoverflow:
 442
 443   https://stackoverflow.com/questions/64268418/how-do-i-write-an-is-power-of-2-predicate-in-acsl
 444
 445 Ultimately it was possible to fix this, but I discovered a few things
 446 about Alt-Ergo (the default prover):
 447
 448  - It can't handle bitwise operations very well.
 449
 450  - You can guide Alt-Ergo, but it requires using the Frama-C GUI which
 451    is very non-intuitive.  You do end up with a script but I couldn't
 452    work out how to integrate this with the command line.
 453
 454  - Z3 should be able to solve this, but I was not able to get
 455    Z3 working with Frama-C on Fedora.
 456
 457 Essentially bitwise tricks like this are a hard case for automated
 458 theorem proving.  I gave up.
 459
 460 \f
 461 = TIMEVAL DIFFERENCE =
 462
 463 This is another nbdkit function:
 464
 465   /* Return the number of µs (microseconds) in y - x. */
 466   static inline int64_t
 467   tvdiff_usec (const struct timeval *x, const struct timeval *y)
 468   {
 469     int64_t usec;
 470
 471     usec = (y->tv_sec - x->tv_sec) * 1000000;
 472     usec += y->tv_usec - x->tv_usec;
 473     return usec;
 474   }
 475
 476 My first attempt at a proof was:
 477
 478   /*@
 479     predicate valid_timeval (struct timeval tv) =
 480       tv.tv_sec >= 0 && tv.tv_usec >= 0 && tv.tv_usec < 1000000;
 481     logic integer tv_to_microseconds (struct timeval tv) =
 482       tv.tv_sec * 1000000 + tv.tv_usec;
 483    */
 484
 485   /* Return the number of µs (microseconds) in y - x. */
 486   /*@
 487     requires \valid_read (x) && \valid_read (y);
 488     requires valid_timeval (*x) && valid_timeval (*y);
 489     ensures \result == tv_to_microseconds (*y) - tv_to_microseconds (*x);
 490    */
 491   static inline int64_t
 492   tvdiff_usec (const struct timeval *x, const struct timeval *y)
 493   {
 494     int64_t usec;
 495
 496     usec = (y->tv_sec - x->tv_sec) * 1000000;
 497     usec += y->tv_usec - x->tv_usec;
 498     return usec;
 499   }
 500
 501 This is not provable, and again the problem is integer overflow.  The
 502 tv_sec field is a 64 bit integer so it's quite easy to construct two
 503 inputs which overflow the output.
 504
 505 Fixing this involves a considerably more complex specification:
 506
 507   /*@
 508     requires \valid_read (x) && \valid_read (y);
 509     requires valid_timeval (*x) && valid_timeval (*y);
 510     requires \valid (r);
 511     assigns *r;
 512     behavior success:
 513       ensures \result == 0;
 514       ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
 515               INT64_MIN <= diff <= INT64_MAX &&
 516               *r == diff;
 517     behavior failure:
 518       ensures \result == -1;
 519       ensures *r == 0;
 520       ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
 521               !(INT64_MIN <= diff <= INT64_MAX);
 522     complete behaviors;
 523     disjoint behaviors;
 524    */
 525   static inline int
 526   tvdiff_usec (const struct timeval *x, const struct timeval *y,
 527                int64_t *r)
 528   ...
 529
 530 and of course that is not sufficient because I also had to fix the
 531 code so it can return an overflow indication.  And actually writing
 532 that code is left as an exercise for the reader as it is not at all
 533 simple.
 534
 535 A few things I learned from this:
 536
 537  - This is a function we have had and used for years, and I don't
 538    think anyone ever thought it had a problem.
 539
 540  - Co-developing the specification alongside the code could help
 541    here.
 542
 543  - You can end up with functions that are much harder to use.  If we
 544    were proof-checking the whole program we could check all callers.
 545
 546  - GCC has __builtin_add_overflow and similar which can be used to do
 547    mathematical operations and detect overflow, but Frama-C does not
 548    understand them.  We could contribute these to the Frama-C standard
 549    library.
 550
 551 \f
 552 = STRING FUNCTIONS =
 553
 554 Uli sent me this function from glibc:
 555
 556   [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c]
 557
 558 To be honest I had problems even fully understanding the semantics of
 559 this function, let alone trying to translate that into ACSL.
 560
 561 So I thought I'd look at a simpler function:
 562
 563   [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c]
 564
 565 I want to point out first that this "strlen" is a fallback which is
 566 only used if you don't have an optimized function for your hardware,
 567 and for the vast majority of users they'll be using something like an
 568 AVX-optimized strlen, not this one.
 569
 570 This function works in two halves, the first part iterates over single
 571 characters until we reach an aligned boundary.  And then the second
 572 part looks at whole words and uses a clever trick to determine if any
 573 byte in the word is zero.
 574
 575 In theory a simple specification which would apply to any strlen-style
 576 function would be:
 577
 578   /*@
 579     requires valid_read_string (str);
 580     assigns \nothing;
 581     ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') &&
 582             str[\result] == '\0';
 583    */
 584   size_t strlen (const char *str)
 585   {
 586     ...
 587
 588 It would be possible to use this to prove a simple strlen
 589 implementation that just worked character by character.
 590
 591 Unfortunately for the actual glibc function we hit another limitation
 592 of the WP plugin, called the Typed Memory Model.
 593
 594 WP models memory as a set of cells, with a type attached to each cell.
 595 So a string for example is a set of char cells, each cell being typed
 596 as char.  You cannot cast these cells to something like a word and try
 597 to do a proof over that.  Simply a limitation which is well
 598 documented.
 599
 600 So a proof of the glibc function eludes me.
 601
 602 \f
 603 There is a set of open source licensed string functions with Frama-C
 604 proofs available:
 605
 606 https://github.com/evdenis/verker
 607
 608 and this is what the strlen function with proof looks like from that:
 609
 610   [https://github.com/evdenis/verker/blob/master/src/strlen.c]
 611
 612 \f
 613 Now you might be asking what happens when you write a function that
 614 uses strlen, for example this trivial function with a working
 615 specification:
 616
 617   $ cat snippets/last_char.c
 618   $ frama-c -wp -wp-rte snippets/last_char.c
 619
 620 The questions here are:
 621
 622  - Is this proving the glibc strlen function?
 623
 624  - Are we calling glibc strlen() from the specification?
 625
 626   $ less /usr/share/frama-c/libc/string.h
 627   $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h
 628
 629 And the answers are no and no.  In fact what's happening here is we
 630 are using Frama-C's own "string.h" header.  This header defines and
 631 contains a specification for how strlen.  But this is also not a proof
 632 of strlen: in fact a second file defines what's known as an "axiomatic
 633 definition" or theory of how strlen works.
 634
 635 Essentially what's happening is we're assuming that strlen in your
 636 standard library (which might not even be glibc) works.  To make a
 637 complete proof you'd also need to additionally prove all libraries you
 638 depend on.
 639
 640 Another little fun thing is Frama-C's strlen function can return -1,
 641 which is used to represent an unbounded string.  Of course unbounded
 642 strings cannot exist on real computers, but they can exist on
 643 theoretical ones!
 644
 645 \f
 646 = IN CONCLUSION =
 647
 648 * Frama-C is a real open source tool used by companies to verify
 649   safety-critical software.  If you've flown in an Airbus or used
 650   electricity from a French nuclear power plant then you've used this
 651   software already.
 652
 653 * Not as hard to use as I imagined.  I spent about 6-12 hours from
 654   zero to being able to fully prove a whole file of production code.
 655
 656 * Mostly useful for small functions.  There are other tools, like TLA+
 657   for proving distributed algorithms.
 658
 659 * Forces you to think very hard about assumptions and corner cases,
 660   but that's the nature of the problem.
 661
 662 * Very well documented, questions answered quickly on stackoverflow.
 663
 664 But it has its limits:
 665
 666 * Typed Memory Model turns out to be restrictive.
 667
 668 * No support for malloc.
 669
 670 * Not good at bitwise ops (can Z3 help?).
 671
 672 * Cannot parse some non-standard __attribute__s (glib).
 673
 674 * Lacks some __builtin_* functions.
 675
 676 Alternatives and related programs:
 677
 678 * Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally
 679   verified subset of the Ada programming language
 680   (tagline: "You simply can't write better code").
 681
 682 * KLEE (http://klee.github.io/) has similarities but is not a formal
 683   proof system.
 684
 685 * CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler
 686   (not open source).
 687
 688 Resources:
 689
 690 * Tutorial that I followed in October which I thought was a good
 691   introduction:
 692   https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html
 693   https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html
 694   https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html
 695
 696 * Frama-C on Stack Overflow:
 697   https://stackoverflow.com/questions/tagged/frama-c
 698
 699 * Allan Blanchard's tutorial:
 700   https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf
 701
 702 * David Mentre's introduction:
 703   https://www.cs.umd.edu/class/spring2016/cmsc838G/frama-c/introduction-to-frama-c_v2.pdf
 704
 705 * Upstream documentation:
 706   Manual: https://frama-c.com/download/frama-c-user-manual.pdf
 707   WP manual: https://frama-c.com/download/frama-c-wp-manual.pdf
 708   ACSL: https://frama-c.com/acsl.html
 709
 710 * ACSL specifications for various string functions:
 711   https://github.com/evdenis/verker