From: Richard W.M. Jones <rjones@redhat.com>
Date: Tue, 13 Oct 2020 17:29:26 +0000 (+0100)
Subject: Update notes for talk on Frama-C.
X-Git-Url: http://git.annexia.org/?a=commitdiff_plain;h=dfb5e4bcf891d3068d245d195376f54575d8655e;p=libguestfs-talks.git

Update notes for talk on Frama-C.
---

diff --git a/2020-frama-c/notes.txt b/2020-frama-c/notes.txt
index edd3143..2fbb76d 100644
--- a/2020-frama-c/notes.txt
+++ b/2020-frama-c/notes.txt
@@ -13,8 +13,8 @@ tool.
 
 I will provide links to tutorials etc at the end.
 
-I decided to spend a day or two seeing if I could formally prove some
-production code inside qemu, and I arbitrarily picked one of the
+I decided to spend a day or two last month seeing if I could formally
+prove some code inside qemu, and I arbitrarily picked one of the
 smallest pieces of code in the "util/" subdirectory:
 
   $ wc -l util/*.c | sort -nr | tail -20
@@ -23,9 +23,9 @@ Actually there are two files:
 
   $ ls -l util/range.c include/qemu/range.h
 
-This is what the Range type looks like.  This is the complete
-definition upstream, with comments which are supposed to explain how
-it works:
+This is what the Range type from the header file looks like.  This is
+the complete definition upstream, with comments which are supposed to
+explain how it works:
 
   /*
    * Operations on 64 bit address ranges.
@@ -81,17 +81,7 @@ The first upstream function is:
 
 and using the predicates we can write a specification:
 
-  /* Is range empty? */
-  /*@
-    requires \valid_read(range);
-    requires valid_range(*range);
-    assigns \nothing;
-    ensures \result <==> empty_range(*range);
-   */
-  static inline bool range_is_empty(const Range *range)
-  {
-      return range->lob > range->upb;
-  }
+  $ less snippets/range_is_empty.c
 
 And we can compile and prove that:
 
@@ -122,7 +112,6 @@ Going back to what we proved so far:
   /*@
     predicate valid_range(struct Range range) =
       range.lob <= range.upb + 1;
-
     predicate empty_range(struct Range range) =
       range.lob == range.upb + 1;
   */
@@ -140,7 +129,6 @@ Going back to what we proved so far:
   }
 
   $ frama-c -wp -wp-rte snippets/range_is_empty.c
-  XXX OUTPUT OF FRAMA-C XXX
 
 The "@"-comments are ACSL.  The C code is parsed by Frama-C into an
 abstract syntax tree.  We told Frama-C to apply the WP plugin.  The
@@ -251,8 +239,8 @@ A few things to notice about this:
 
 Let's try to prove this one:
 
-  $ frama-c ... snippets/range_size.c
-  XXX It fails!
+  $ frama-c -wp -wp-rte snippets/range_size.c
+  It fails!
 
 This was confusing to me because at this point I'd probably spent an
 hour, and I'd proven about 5 functions successfully, and this function
@@ -324,6 +312,8 @@ that callers do not try to call range_size() on the empty range.
       return range->upb - range->lob + 1;
   }
 
+  $ frama-c -wp -wp-rte snippets/range_size-good.c
+
 On to the next function.  Again this seems very simple, but in fact it
 contains a serious problem:
 
@@ -379,6 +369,9 @@ I also had to modify my earlier empty_range predicate:
   /*@ // after:
     predicate empty_range(struct Range range) =
       range.lob > 0 && range.lob == range.upb + 1;
+
+    predicate total_range(struct Range range) =
+      range.lob == 0 && range.upb == UINT64_MAX;
   */
 
 What can we say about this?  If you were developing the Range type and
@@ -525,7 +518,6 @@ Fixing this involves a considerably more complex specification:
                int64_t *r)
   ...
 
-
 and of course that is not sufficient because I also had to fix the
 code so it can return an overflow indication.  And actually writing
 that code is left as an exercise for the reader as it is not at all
@@ -547,12 +539,96 @@ A few things I learned from this:
    understand them.  We could contribute these to the Frama-C standard
    library.
 
+= STRING FUNCTIONS =
+
+Uli sent me this function from glibc:
+
+  [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c]
+
+To be honest I had problems even fully understanding the semantics of
+this function, let alone trying to translate that into ACSL.
+
+So I thought I'd look at a simpler function:
+
+  [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c]
+
+I want to point out first that this "strlen" is a fallback which is
+only used if you don't have an optimized function for your hardware,
+and for the vast majority of users they'll be using something like an
+AVX-optimized strlen, not this one.
+
+This function works in two halves, the first part iterates over single
+characters until we reach an aligned boundary.  And then the second
+part looks at whole words and uses a clever trick to determine if any
+byte in the word is zero.
 
+In theory a simple specification which would apply to any strlen-style
+function would be:
 
+  /*@
+    requires valid_read_string (str);
+    assigns \nothing;
+    ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') &&
+            str[\result] == '\0';
+   */
+  size_t strlen (const char *str)
+  {
+    ...
+
+It would be possible to use this to prove a simple strlen
+implementation that just worked character by character.
+
+Unfortunately for the actual glibc function we hit another limitation
+of the WP plugin, called the Typed Memory Model.
+
+WP models memory as a set of cells, with a type attached to each cell.
+So a string for example is a set of char cells, each cell being typed
+as char.  You cannot cast these cells to something like a word and try
+to do a proof over that.  Simply a limitation which is well
+documented.
+
+So a proof of the glibc function eludes me.
+
+There is a set of open source licensed string functions with Frama-C
+proofs available:
+
+https://github.com/evdenis/verker
+
+and this is what the strlen function with proof looks like from that:
 
+  [https://github.com/evdenis/verker/blob/master/src/strlen.c
 
+Now you might be asking what happens when you write a function that
+uses strlen, for example this trivial function with a working
+specification:
 
+  $ cat snippets/last_char.c
+  $ frama-c -wp -wp-rte snippets/last_char.c
 
+The questions here are:
+
+ - Is this proving the glibc strlen function?
+
+ - Are we calling glibc strlen() from the specification?
+
+  $ less /usr/share/frama-c/libc/string.h
+  $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h
+
+And the answers are no and no.  In fact what's happening here is we
+are using Frama-C's own "string.h" header.  This header defines and
+contains a specification for how strlen.  But this is also not a proof
+of strlen: in fact a second file defines what's known as an "axiomatic
+definition" or theory of how strlen works.
+
+Essentially what's happening is we're assuming that strlen in your
+standard library (which might not even be glibc) works.  To make a
+complete proof you'd also need to additionally prove all libraries you
+depend on.
+
+Another little fun thing is Frama-C's strlen function can return -1,
+which is used to represent an unbounded string.  Of course unbounded
+strings cannot exist on real computers, but they can exist on
+theoretical ones!
 
 = IN CONCLUSION =
 
@@ -563,7 +639,6 @@ A few things I learned from this:
 
 * Not as hard to use as I imagined.  I spent about 6-12 hours from
   zero to being able to fully prove a whole file of production code.
-  (But I did study logic under Prof Crispin Wright at university).
 
 * Mostly useful for small functions.  There are other tools, like TLA+
   for proving distributed algorithms.
@@ -582,3 +657,17 @@ But it has its limits:
 * Not good at bitwise ops (can Z3 help?).
 
 * Cannot parse some non-standard __attribute__s (glib).
+
+* Lacks some __builtin_* functions.
+
+Alternatives and related programs:
+
+* Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally
+  verified subset of the Ada programming language
+  (tagline: "You simply can't write better code").
+
+* KLEE (http://klee.github.io/) has similarities but is not a formal
+  proof system.
+
+* CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler
+  (not open source).
diff --git a/2020-frama-c/snippets/last_char.c b/2020-frama-c/snippets/last_char.c
new file mode 100644
index 0000000..69c252f
--- /dev/null
+++ b/2020-frama-c/snippets/last_char.c
@@ -0,0 +1,20 @@
+#include <string.h>
+
+/*@
+  requires valid_read_string (str);
+  assigns \nothing;
+  behavior good:
+    assumes strlen(str) > 0;
+    ensures \result == str[strlen(str)-1];
+  behavior bad:
+    assumes strlen(str) == 0;
+    ensures \result == -1;
+  complete behaviors;
+  disjoint behaviors;
+*/
+char
+last_char (const char *str)
+{
+  size_t n = strlen (str);
+  return n > 0 ? str[n-1] : -1;
+}
diff --git a/2020-frama-c/snippets/range_is_empty.c b/2020-frama-c/snippets/range_is_empty.c
new file mode 100644
index 0000000..364d338
--- /dev/null
+++ b/2020-frama-c/snippets/range_is_empty.c
@@ -0,0 +1,27 @@
+#include <stdbool.h>
+#include <stdint.h>
+
+struct Range {
+    uint64_t lob;
+    uint64_t upb;
+};
+typedef struct Range Range;
+
+/*@
+  predicate valid_range(struct Range range) =
+    range.lob <= range.upb + 1;
+  predicate empty_range(struct Range range) =
+    range.lob == range.upb + 1;
+*/
+
+/* Is range empty? */
+/*@
+  requires \valid_read(range);
+  requires valid_range(*range);
+  assigns \nothing;
+  ensures \result <==> empty_range(*range);
+ */
+static inline bool range_is_empty(const Range *range)
+{
+    return range->lob > range->upb;
+}
diff --git a/2020-frama-c/snippets/range_size-good.c b/2020-frama-c/snippets/range_size-good.c
new file mode 100644
index 0000000..7343dfa
--- /dev/null
+++ b/2020-frama-c/snippets/range_size-good.c
@@ -0,0 +1,32 @@
+#include <stdint.h>
+
+struct Range {
+    uint64_t lob;
+    uint64_t upb;
+};
+typedef struct Range Range;
+
+/*@
+  predicate valid_range(struct Range range) =
+    range.lob <= range.upb + 1;
+  predicate empty_range(struct Range range) =
+    range.lob == range.upb + 1;
+  logic integer size_of_bounds(integer lob, integer upb) =
+    upb - lob + 1;
+  logic integer size_of_range(struct Range range) =
+    size_of_bounds(range.lob, range.upb);
+*/
+
+/* Get the size of range. */
+/*@
+  requires \valid(range);
+  requires valid_range(*range);
+  requires !empty_range(*range);
+  assigns \nothing;
+                      // case UINT64_MAX+1 -> 0
+  ensures \result == (uint64_t)size_of_range(*range);
+*/
+static inline uint64_t range_size(const Range *range)
+{
+    return range->upb - range->lob + 1;
+}
diff --git a/2020-frama-c/snippets/range_size.c b/2020-frama-c/snippets/range_size.c
new file mode 100644
index 0000000..d5c36ed
--- /dev/null
+++ b/2020-frama-c/snippets/range_size.c
@@ -0,0 +1,28 @@
+#include <stdint.h>
+
+struct Range {
+    uint64_t lob;
+    uint64_t upb;
+};
+typedef struct Range Range;
+
+/*@
+  predicate valid_range(struct Range range) =
+    range.lob <= range.upb + 1;
+  logic integer size_of_bounds(integer lob, integer upb) =
+    upb - lob + 1;
+  logic integer size_of_range(struct Range range) =
+    size_of_bounds(range.lob, range.upb);
+*/
+
+/* Get the size of range. */
+/*@
+  requires \valid(range);
+  requires valid_range(*range);
+  assigns \nothing;
+  ensures \result == size_of_range(*range);
+*/
+static inline uint64_t range_size(const Range *range)
+{
+    return range->upb - range->lob + 1;
+}