2020-frama-c/notes.txt

   1 16th Nov 2020
   2 Formally proving tiny bits of qemu using Frama-C
   3
   4 [in medias res]
   5
   6 In early October I was talking to one of the developers of Frama-C,
   7 which is a modular framework for verifying C programs.  It's been on
   8 my radar to learn for a long time, and I'm a co-maintainer of the
   9 package in Fedora.  I tried to follow one tutorial but never really
  10 got past installation and UI issues.  But the developer pointed me to
  11 a very good, short set of blog postings about using the command line
  12 tool.
  13
  14 I will provide links to tutorials etc at the end.
  15
  16 \f
  17 I decided to spend a day or two last month seeing if I could formally
  18 prove some code inside qemu, and I arbitrarily picked one of the
  19 smallest pieces of code in the "util/" subdirectory:
  20
  21   $ wc -l util/*.c | sort -nr | tail -20
  22
  23 Actually there are two files:
  24
  25   $ ls -l util/range.c include/qemu/range.h
  26
  27 This is what the Range type from the header file looks like.  This is
  28 the complete definition upstream, with comments which are supposed to
  29 explain how it works:
  30
  31   /*
  32    * Operations on 64 bit address ranges.
  33    * Notes:
  34    * - Ranges must not wrap around 0, but can include UINT64_MAX.
  35    */
  36   struct Range {
  37       /*
  38        * Do not access members directly, use the functions!
  39        * A non-empty range has @lob <= @upb.
  40        * An empty range has @lob == @upb + 1.
  41        */
  42       uint64_t lob;        /* inclusive lower bound */
  43       uint64_t upb;        /* inclusive upper bound */
  44   };
  45   typedef struct Range Range;
  46
  47   +-----------+-----------+------------- - - - - -
  48   |           |           |
  49   +-----------+-----------+------------- - - - - -
  50                ^         ^
  51               lob ----- upb
  52
  53 Using the tutorial as my guide I wrote some simple predicates, a
  54 "predicate" being a statement which can be true or false:
  55
  56        * A non-empty range has @lob <= @upb.
  57        * An empty range has @lob == @upb + 1.
  58
  59   /*@
  60     predicate valid_range(struct Range range) =
  61       range.lob <= range.upb + 1;
  62
  63     predicate empty_range(struct Range range) =
  64       range.lob == range.upb + 1;
  65   */
  66
  67 Notice a few things here:
  68
  69  - These are statements written in a formal language.
  70
  71  - I'm using structs directly from the C code.
  72
  73  - The upstream comments translate into machine-checkable code.
  74
  75 The first upstream function is:
  76
  77   /* Is range empty? */
  78   static inline bool range_is_empty(const Range *range)
  79   {
  80       return range->lob > range->upb;
  81   }
  82
  83 and using the predicates we can write a specification:
  84
  85   $ cat snippets/range_is_empty.c
  86
  87 And we can compile and prove that:
  88
  89   $ frama-c -wp -wp-rte snippets/range_is_empty.c
  90
  91 Frama-C parsed the C code and the formal specification and machine
  92 checked it, and it's correct - the code is bug-free.
  93
  94 \f
  95 = OVERVIEW OF FRAMA-C ECOSYSTEM =
  96
  97 Let's interrupt this and talk about Frama-C.  The name stands
  98 for "Framework for Static Analysis of the C language".
  99
 100 It's modular, with a core program that reads C source code
 101 and turns it into Abstract Syntax Trees.  And a set of plugins
 102 to do static analysis by annotating these syntax trees.
 103 Plugins can cooperate, so analysis can be passed between
 104 plugins.
 105
 106 The following slides are taken from David Mentré‘s 2016 presentation.
 107
 108 \f
 109 = BACK TO RANGE.C =
 110
 111 Going back to what we proved so far:
 112
 113   /*@
 114     predicate valid_range(struct Range range) =
 115       range.lob <= range.upb + 1;
 116     predicate empty_range(struct Range range) =
 117       range.lob == range.upb + 1;
 118   */
 119
 120   /* Is range empty? */
 121   /*@
 122     requires \valid_read(range);
 123     requires valid_range(*range);
 124     assigns \nothing;
 125     ensures \result <==> empty_range(*range);
 126    */
 127   static inline bool range_is_empty(const Range *range)
 128   {
 129       return range->lob > range->upb;
 130   }
 131
 132   $ frama-c -wp -wp-rte snippets/range_is_empty.c
 133
 134 The "@"-comments are ACSL.  The C code is parsed by Frama-C into an
 135 abstract syntax tree.  We told Frama-C to apply the WP plugin.  The
 136 specification and C code were translated into first-order logic
 137 statements and passed to the Alt-Ergo SMT prover to see if it could
 138 find a Proof Obligation (PO).
 139
 140 We need to be very clear about what happened:
 141
 142  - The preconditions must be fulfilled in order for the proof to be
 143    correct.  If you are checking a whole program then that would check
 144    that all callers of this function are fulfilling the preconditions,
 145    but if you check only single functions then you are not doing that.
 146
 147  - The proof only shows that the post-condition holds.
 148
 149  - This only shows that the source code is correct.  The compiler
 150    could still compile the program incorrectly.  If you really want to
 151    get into this you have to use a verified C compiler like CompCert,
 152    and there are even verified CPU implementations.
 153
 154  - Given those assumptions, the code is bug free - you don't need to
 155    write any tests.
 156
 157 \f
 158 Obviously this is a single line, very trivial function, but I was
 159 quite pleased that I was able to prove it quickly.  I kept going on
 160 the range file.  The next function is:
 161
 162   /* Does range contain val? */
 163   static inline bool range_contains(const Range *range, uint64_t val)
 164   {
 165       return val >= range->lob && val <= range->upb;
 166   }
 167
 168 This is similarly easy to prove after adding another predicate at the
 169 top of the file:
 170
 171   /*@
 172     predicate value_in_range(struct Range range, uint64_t value) =
 173       range.lob <= value <= range.upb;
 174   */
 175
 176   /* Does range contain val? */
 177   /*@
 178     requires \valid_read(range);
 179     requires valid_range(*range);
 180     assigns \nothing;
 181     ensures \result <==> value_in_range(*range, val);
 182    */
 183   static inline bool range_contains(const Range *range, uint64_t val)
 184   {
 185       return val >= range->lob && val <= range->upb;
 186   }
 187
 188 \f
 189 The next function is range_make_empty, again easy to prove using the
 190 already existing empty_range predicate.  Notice how we declare which
 191 memory locations this function assigns to:
 192
 193   /* Initialize range to the empty range */
 194   /*@
 195     requires \valid(range);
 196     requires valid_range(*range);
 197     assigns *range;
 198     ensures empty_range(*range);
 199    */
 200   static inline void range_make_empty(Range *range)
 201   {
 202       *range = range_empty;
 203       assert(range_is_empty(range));
 204   }
 205
 206 \f
 207 I'm going to skip forward a few functions to get to an interesting one.
 208 This seems trivial:
 209
 210   /* Get the size of range. */
 211   static inline uint64_t range_size(const Range *range)
 212   {
 213       return range->upb - range->lob + 1;
 214   }
 215
 216 My first attempt at a proof was:
 217
 218   /*@
 219     logic integer size_of_bounds(integer lob, integer upb) =
 220       upb - lob + 1;
 221
 222     logic integer size_of_range(struct Range range) =
 223       size_of_bounds(range.lob, range.upb);
 224   */
 225
 226   /* Get the size of range. */
 227   /*@
 228     requires \valid(range);
 229     requires valid_range(*range);
 230     assigns \nothing;
 231     ensures \result == size_of_range(*range);
 232   */
 233   static inline uint64_t range_size(const Range *range)
 234   {
 235       return range->upb - range->lob + 1;
 236   }
 237
 238 A few things to notice about this:
 239
 240  - I'm using ACSL "logic" statements, which are similar to functions.
 241
 242  - ACSL has an "integer" type which is an unbounded integer.
 243
 244 Let's try to prove this one:
 245
 246   $ frama-c -wp -wp-rte snippets/range_size.c
 247   It fails!
 248
 249 This was confusing to me because at this point I'd probably spent an
 250 hour, and I'd proven about 5 functions successfully, and this function
 251 looks equally trivial and yet it cannot be proven.  Why?
 252
 253 One way to find the problem would be to find a COUNTEREXAMPLE.  A
 254 counterexample is an instance of an input that satisfies all of the
 255 preconditions, but makes the postcondition false.  Now Frama-C has
 256 pluggable provers, and one prover called Z3, originally written by
 257 Microsoft, can be used with Frama-C and can sometimes find
 258 counterexamples.  However for some reason the version of Z3 in Fedora
 259 does not like to work with the version of Frama-C in Fedora and I
 260 don't know why.
 261
 262 So this is the counterexample which I worked out myself:
 263
 264   /*@
 265     logic integer size_of_bounds(integer lob, integer upb) =
 266       upb - lob + 1;
 267
 268     logic integer size_of_range(struct Range range) =
 269       size_of_bounds(range.lob, range.upb);
 270   */
 271
 272   /* Get the size of range. */
 273   /*@
 274     requires \valid(range);
 275     requires valid_range(*range);
 276     assigns \nothing;
 277     ensures \result == size_of_range(*range);
 278   */
 279   static inline uint64_t range_size(const Range *range)
 280   {
 281       return range->upb - range->lob + 1;
 282   }
 283
 284   +-------------------------------------------------------+
 285   |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
 286   +-------------------------------------------------------+
 287   range->lob == 0                 range->upb == UINT64_MAX
 288
 289 In this case size_of_range() returns an integer.  Recall that integers
 290 are an unbounded type.  So it returns UINT64_MAX + 1 (a 65 bit
 291 number).  Our C function on the other hand returns 0.
 292
 293 This is a bug in the function.  There are really two ways to
 294 look at this:
 295
 296 (1) The function shouldn't be called on empty ranges (size 0), and so
 297 if the function returns 0 it means "the total range".  Or:
 298
 299 (2) The function should have a different signature so it can return a
 300 separate indication for "empty range" vs "total range".
 301
 302 I went with the first, so the final working specification is this.  If
 303 we were proving the entire program then we could statically verify
 304 that callers do not try to call range_size() on the empty range.
 305
 306   /*@
 307     requires \valid(range);
 308     requires valid_range(*range);
 309     requires !empty_range(*range);
 310     assigns \nothing;
 311                         // case UINT64_MAX+1 -> 0
 312     ensures \result == (uint64_t)size_of_range(*range);
 313    */
 314   static inline uint64_t range_size(const Range *range)
 315   {
 316       return range->upb - range->lob + 1;
 317   }
 318
 319   $ frama-c -wp -wp-rte snippets/range_size-good.c
 320
 321 \f
 322 On to the next function.  Again this seems very simple, but in fact it
 323 contains a serious problem:
 324
 325   /*
 326    * Initialize range to span the interval [lob,lob + size - 1].
 327    * size may be 0. Range must not overflow.
 328    */
 329   static inline void range_init_nofail(Range *range, uint64_t lob,
 330                                        uint64_t size)
 331   {
 332       range->lob = lob;
 333       range->upb = lob + size - 1;
 334   }
 335
 336 What does lob == 0, size == 0 mean?  It will create a total range.
 337
 338   +-------------------------------------------------------+
 339   |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|
 340   +-------------------------------------------------------+
 341   range->lob == 0                 range->upb == UINT64_MAX
 342
 343 What does lob == 1, size == 0 mean?  It will create an empty range.
 344
 345 Because I didn't want to change the code, that makes the proof very
 346 long, and it demonstrates another feature of ACSL specifications
 347 ("behaviors"):
 348
 349   /*@
 350     requires \valid(range);
 351     requires lob + (integer)size <= UINT64_MAX;
 352     assigns *range;
 353     behavior empty:
 354       assumes lob > 0 && size == 0;
 355       ensures empty_range(*range);
 356     behavior non_empty:
 357       assumes size > 0;
 358       ensures range->lob == lob;
 359       ensures size_of_bounds(lob, range->upb) == size;
 360     behavior total:
 361       assumes lob == 0 && size == 0;
 362       ensures total_range(*range);
 363     complete behaviors;
 364     disjoint behaviors;
 365    */
 366
 367 I also had to modify my earlier empty_range predicate:
 368
 369   /*@ // before:
 370     predicate empty_range(struct Range range) =
 371       range.lob == range.upb + 1;
 372   */
 373
 374   /*@ // after:
 375     predicate empty_range(struct Range range) =
 376       range.lob > 0 && range.lob == range.upb + 1;
 377
 378     predicate total_range(struct Range range) =
 379       range.lob == 0 && range.upb == UINT64_MAX;
 380   */
 381
 382 What can we say about this?  If you were developing the Range type and
 383 the proof at the same time, you probably wouldn't define the Range
 384 type in this way, and you probably wouldn't choose an initialization
 385 function which had this complex set of behaviours.
 386
 387 I did prove most of the rest of the Range functions, but I wanted to
 388 point out a few things about the final proof:
 389
 390  - The original file includes glib headers, but those cannot be parsed
 391    by Frama-C's parser.  The problem appears to be some unsupported
 392    __attribute__s.  Some attributes seem to work, and others are not
 393    supported, and this of course limits what you can prove if it
 394    depends on any external library that might use these.
 395
 396  - For the same reason I had to exclude a glib function for operating
 397    on lists of ranges from the proof.
 398
 399  - You can run frama-c on the original source file, with a few #ifdef
 400    modifications, so it would be possible to maintain the annotations
 401    upstream, and run the proof checker as a CI test.
 402
 403  - It probably doesn't make sense for qemu right now though, unless we
 404    could prove more substantial pieces of code.
 405
 406 \f
 407 = POWER OF 2 =
 408
 409 This is a function from nbdkit:
 410
 411   /* Returns true if v is a power of 2.
 412    *
 413    * Uses the algorithm described at
 414    * http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
 415    */
 416   static inline bool
 417   is_power_of_2 (unsigned long v)
 418   {
 419     return v && ((v & (v - 1)) == 0);
 420   }
 421
 422 and my initial specification started by defining:
 423
 424   /*@
 425     predicate positive_power_of_2 (integer i) =
 426       i > 0 &&
 427       (i == 1 || ((i & 1) == 0 && positive_power_of_2 (i >> 1)));
 428
 429     lemma positive_power_of_2 (1);
 430     lemma positive_power_of_2 (2);
 431     lemma positive_power_of_2 (4);
 432     lemma !positive_power_of_2 (7);
 433    */
 434
 435 I was using the lemmas (which are like assertions, except statically
 436 verified), to test my predicate, but sadly Frama-C was not able to
 437 prove them even though they appear to be trivial.
 438
 439 This led to some discussion on stackoverflow:
 440
 441   https://stackoverflow.com/questions/64268418/how-do-i-write-an-is-power-of-2-predicate-in-acsl
 442
 443 Ultimately it was possible to fix this, but I discovered a few things
 444 about Alt-Ergo (the default prover):
 445
 446  - It can't handle bitwise operations very well.
 447
 448  - You can guide Alt-Ergo, but it requires using the Frama-C GUI which
 449    is very non-intuitive.  You do end up with a script but I couldn't
 450    work out how to integrate this with the command line.
 451
 452  - Z3 should be able to solve this, but I was not able to get
 453    Z3 working with Frama-C on Fedora.
 454
 455 Essentially bitwise tricks like this are a hard case for automated
 456 theorem proving.  I gave up.
 457
 458 \f
 459 = TIMEVAL DIFFERENCE =
 460
 461 This is another nbdkit function:
 462
 463   /* Return the number of µs (microseconds) in y - x. */
 464   static inline int64_t
 465   tvdiff_usec (const struct timeval *x, const struct timeval *y)
 466   {
 467     int64_t usec;
 468
 469     usec = (y->tv_sec - x->tv_sec) * 1000000;
 470     usec += y->tv_usec - x->tv_usec;
 471     return usec;
 472   }
 473
 474 My first attempt at a proof was:
 475
 476   /*@
 477     predicate valid_timeval (struct timeval tv) =
 478       tv.tv_sec >= 0 && tv.tv_usec >= 0 && tv.tv_usec < 1000000;
 479     logic integer tv_to_microseconds (struct timeval tv) =
 480       tv.tv_sec * 1000000 + tv.tv_usec;
 481    */
 482
 483   /* Return the number of µs (microseconds) in y - x. */
 484   /*@
 485     requires \valid_read (x) && \valid_read (y);
 486     requires valid_timeval (*x) && valid_timeval (*y);
 487     ensures \result == tv_to_microseconds (*y) - tv_to_microseconds (*x);
 488    */
 489   static inline int64_t
 490   tvdiff_usec (const struct timeval *x, const struct timeval *y)
 491   {
 492     int64_t usec;
 493
 494     usec = (y->tv_sec - x->tv_sec) * 1000000;
 495     usec += y->tv_usec - x->tv_usec;
 496     return usec;
 497   }
 498
 499 This is not provable, and again the problem is integer overflow.  The
 500 tv_sec field is a 64 bit integer so it's quite easy to construct two
 501 inputs which overflow the output.
 502
 503 Fixing this involves a considerably more complex specification:
 504
 505   /*@
 506     requires \valid_read (x) && \valid_read (y);
 507     requires valid_timeval (*x) && valid_timeval (*y);
 508     requires \valid (r);
 509     assigns *r;
 510     behavior success:
 511       ensures \result == 0;
 512       ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
 513               INT64_MIN <= diff <= INT64_MAX &&
 514               *r == diff;
 515     behavior failure:
 516       ensures \result == -1;
 517       ensures *r == 0;
 518       ensures \let diff = tv_to_microseconds (*y) - tv_to_microseconds (*x);
 519               !(INT64_MIN <= diff <= INT64_MAX);
 520     complete behaviors;
 521     disjoint behaviors;
 522    */
 523   static inline int
 524   tvdiff_usec (const struct timeval *x, const struct timeval *y,
 525                int64_t *r)
 526   ...
 527
 528 and of course that is not sufficient because I also had to fix the
 529 code so it can return an overflow indication.  And actually writing
 530 that code is left as an exercise for the reader as it is not at all
 531 simple.
 532
 533 A few things I learned from this:
 534
 535  - This is a function we have had and used for years, and I don't
 536    think anyone ever thought it had a problem.
 537
 538  - Co-developing the specification alongside the code could help
 539    here.
 540
 541  - You can end up with functions that are much harder to use.  If we
 542    were proof-checking the whole program we could check all callers.
 543
 544  - GCC has __builtin_add_overflow and similar which can be used to do
 545    mathematical operations and detect overflow, but Frama-C does not
 546    understand them.  We could contribute these to the Frama-C standard
 547    library.
 548
 549 \f
 550 = STRING FUNCTIONS =
 551
 552 Uli sent me this function from glibc:
 553
 554   [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strverscmp.c]
 555
 556 To be honest I had problems even fully understanding the semantics of
 557 this function, let alone trying to translate that into ACSL.
 558
 559 So I thought I'd look at a simpler function:
 560
 561   [https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c]
 562
 563 I want to point out first that this "strlen" is a fallback which is
 564 only used if you don't have an optimized function for your hardware,
 565 and for the vast majority of users they'll be using something like an
 566 AVX-optimized strlen, not this one.
 567
 568 This function works in two halves, the first part iterates over single
 569 characters until we reach an aligned boundary.  And then the second
 570 part looks at whole words and uses a clever trick to determine if any
 571 byte in the word is zero.
 572
 573 In theory a simple specification which would apply to any strlen-style
 574 function would be:
 575
 576   /*@
 577     requires valid_read_string (str);
 578     assigns \nothing;
 579     ensures (\forall integer i; 0 <= i < \result ==> str[i] != '\0') &&
 580             str[\result] == '\0';
 581    */
 582   size_t strlen (const char *str)
 583   {
 584     ...
 585
 586 It would be possible to use this to prove a simple strlen
 587 implementation that just worked character by character.
 588
 589 Unfortunately for the actual glibc function we hit another limitation
 590 of the WP plugin, called the Typed Memory Model.
 591
 592 WP models memory as a set of cells, with a type attached to each cell.
 593 So a string for example is a set of char cells, each cell being typed
 594 as char.  You cannot cast these cells to something like a word and try
 595 to do a proof over that.  Simply a limitation which is well
 596 documented.
 597
 598 So a proof of the glibc function eludes me.
 599
 600 \f
 601 There is a set of open source licensed string functions with Frama-C
 602 proofs available:
 603
 604 https://github.com/evdenis/verker
 605
 606 and this is what the strlen function with proof looks like from that:
 607
 608   [https://github.com/evdenis/verker/blob/master/src/strlen.c]
 609
 610 \f
 611 Now you might be asking what happens when you write a function that
 612 uses strlen, for example this trivial function with a working
 613 specification:
 614
 615   $ cat snippets/last_char.c
 616   $ frama-c -wp -wp-rte snippets/last_char.c
 617
 618 The questions here are:
 619
 620  - Is this proving the glibc strlen function?
 621
 622  - Are we calling glibc strlen() from the specification?
 623
 624   $ less /usr/share/frama-c/libc/string.h
 625   $ less /usr/share/frama-c/libc/__fc_string_axiomatic.h
 626
 627 And the answers are no and no.  In fact what's happening here is we
 628 are using Frama-C's own "string.h" header.  This header defines and
 629 contains a specification for how strlen.  But this is also not a proof
 630 of strlen: in fact a second file defines what's known as an "axiomatic
 631 definition" or theory of how strlen works.
 632
 633 Essentially what's happening is we're assuming that strlen in your
 634 standard library (which might not even be glibc) works.  To make a
 635 complete proof you'd also need to additionally prove all libraries you
 636 depend on.
 637
 638 Another little fun thing is Frama-C's strlen function can return -1,
 639 which is used to represent an unbounded string.  Of course unbounded
 640 strings cannot exist on real computers, but they can exist on
 641 theoretical ones!
 642
 643 \f
 644 = IN CONCLUSION =
 645
 646 * Frama-C is a real open source tool used by companies to verify
 647   safety-critical software.  If you've flown in an Airbus or used
 648   electricity from a French nuclear power plant then you've used this
 649   software already.
 650
 651 * Not as hard to use as I imagined.  I spent about 6-12 hours from
 652   zero to being able to fully prove a whole file of production code.
 653
 654 * Mostly useful for small functions.  There are other tools, like TLA+
 655   for proving distributed algorithms.
 656
 657 * Forces you to think very hard about assumptions and corner cases,
 658   but that's the nature of the problem.
 659
 660 * Very well documented, questions answered quickly on stackoverflow.
 661
 662 But it has its limits:
 663
 664 * Typed Memory Model turns out to be restrictive.
 665
 666 * No support for malloc.
 667
 668 * Not good at bitwise ops (can Z3 help?).
 669
 670 * Cannot parse some non-standard __attribute__s (glib).
 671
 672 * Lacks some __builtin_* functions.
 673
 674 Alternatives and related programs:
 675
 676 * Ada SPARK Pro (https://www.adacore.com/sparkpro) is a formally
 677   verified subset of the Ada programming language
 678   (tagline: "You simply can't write better code").
 679
 680 * KLEE (http://klee.github.io/) has similarities but is not a formal
 681   proof system.
 682
 683 * CompCert (http://compcert.inria.fr/) is INRIA's verified C compiler
 684   (not open source).
 685
 686 Resources:
 687
 688 * Tutorial that I followed in October which I thought was a good
 689   introduction:
 690   https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html
 691   https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html
 692   https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html
 693
 694 * Frama-C on Stack Overflow:
 695   https://stackoverflow.com/questions/tagged/frama-c
 696
 697 * Allan Blanchard's tutorial:
 698   https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf
 699
 700 * David Mentre's introduction:
 701   https://www.cs.umd.edu/class/spring2016/cmsc838G/frama-c/introduction-to-frama-c_v2.pdf
 702
 703 * Upstream documentation:
 704   Manual: https://frama-c.com/download/frama-c-user-manual.pdf
 705   WP manual: https://frama-c.com/download/frama-c-wp-manual.pdf
 706   ACSL: https://frama-c.com/acsl.html
 707
 708 * ACSL specifications for various string functions:
 709   https://github.com/evdenis/verker