c2lib documentation index

c2lib is a library of basic tools for use by C programmers. It contains features heavily influenced by both Perl's string handling and C++'s Standard Template Library (STL).

The primary aims of c2lib are:

Tutorial and programming examples

Join a list of strings and print

   #include <pool.h>
   #include <pstring.h>
  
   const char *strings[] = { "John", "Paul", "George", "Ringo" };
5 
   main ()
   {
     pool pool = global_pool;
     vector v = pvectora (pool, strings, 4);
10   printf ("Introducing the Beatles: %s\n", pjoin (pool, v, ", "));
   }

When run, this program prints:

Introducing the Beatles: John, Paul, George, Ringo

Compare this to the equivalent Perl code:

#!/usr/bin/perl

printf "Introducing the Beatles: %s\n",
    join(", ", "John", "Paul", "George", "Ringo");

The pjoin(3) function on line 10 is equivalent to the plain join function in Perl. It takes a list of strings and joins them with a separator string (in this case ", "), and creates a new string which is returned and printed.

The pvectora(3) function (line 9) takes a normal C array of strings and converts it into a c2lib vector. You will find out more about vectors later.

In this case all our allocations are done in a standard pool which is created automatically before main is called and deleted after main returns. This pool is called global_pool(3). You will find out more about pools below.

Notice that, as with most c2lib programs, there is no need to explicitly deallocate (free) objects once you have finished using them. Almost all of the time, objects are freed automatically for you by the system.

A vector of integers

   #include <pool.h>
   #include <vector.h>
   #include <pstring.h>
   
5  main ()
   {
     pool pool = global_pool;
     vector v = new_vector (pool, int);
     int i, prod = 1;
10 
     for (i = 1; i <= 10; ++i)
       vector_push_back (v, i);
   
     for (i = 0; i < vector_size (v); ++i)
15     {
         int elem;
   
         vector_get (v, i, elem);
         prod *= elem;
20     }
   
     printf ("product of integers: %s = %d\n",
   	     pjoin (pool, pvitostr (pool, v), " * "),
   	     prod);
25 }

When run:

product of integers: 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 = 3628800

The call to new_vector(3) on line 8 creates a new vector object (abstract data type). In this case the vector is allocated in the global pool and you have told it that each element of the vector will be of type int. Vectors are arrays which automatically expand when you push elements onto them. This vector behaves very much like a C++ STL vector<int> or a Perl array.

On lines 11-12, we push the numbers 1 through to 10 into the vector. The vector_push_back(3) function pushes an element onto the end of the vector. There are also vector_pop_back(3) (removes and returns the last element of a vector), vector_push_front(3) and vector_pop_front(3) operations.

Lines 14-20 show the general pattern for iterating over the elements in a vector. The call to vector_get (line 18) returns the ith element of vector v into variable elem.

Finally lines 22-24 print out the result. We use the pjoin(3) function again to join the numbers with the string " * " between each pair. Also note the use of the strange pvitostr(3) function. pjoin(3) is expecting a vector of strings (ie. a vector of char *), but we have a vector of int, which is incompatible. The pvitostr(3) function promotes a vector of integers into a vector of strings.

The c2lib library stores vectors as arrays and reallocates them using prealloc(3) whenever it needs to expand them. This means that certain operations on vectors are efficient, and some other operations are less efficient. Getting an element of a vector or replacing an element in the middle of a vector are both fast O(1) operations, equivalent to the ordinary C index ([]) operator. vector_push_back(3) and vector_pop_back(3) are also fast. However vector_push_front(3) and vector_pop_front(3) are O(n) operations because they require the library to shift up all the elements in the array by one place. Normally however if your vectors are very short (say, fewer than 100 elements), the speed difference will not be noticable, whereas the productivity gains from using vectors over hand-rolled linked lists or other structures will be large. The vector type also allows you to insert and remove elements in the middle of the array, as shown in the next example below:

   #include <pool.h>
   #include <vector.h>
   #include <pstring.h>
   
5  main ()
   {
     pool pool = global_pool;
     vector v = pvector (pool,
   		         "a", "b", "c", "d", "e",
10 		         "f", "g", "h", "i", "j", 0);
     const char *X = "X";
   
     printf ("Original vector contains: %s\n",
   	  pjoin (pool, v, ", "));
15 
     vector_erase_range (v, 3, 6);
   
     printf ("After erasing elements 3-5, vector contains: %s\n",
   	     pjoin (pool, v, ", "));
20 
     vector_insert (v, 3, X);
     vector_insert (v, 4, X);
     vector_insert (v, 5, X);
   
25   printf ("After inserting 3 Xs, vector contains: %s\n",
	     pjoin (pool, v, ", "));

     vector_clear (v);
     vector_fill (v, X, 10);
30 
     printf ("After clearing and inserting 10 Xs, vector contains: %s\n",
   	     pjoin (pool, v, ", "));
   }

When run:

Original vector contains: a, b, c, d, e, f, g, h, i, j
After erasing elements 3-5, vector contains: a, b, c, g, h, i, j
After inserting 3 Xs, vector contains: a, b, c, X, X, X, g, h, i, j
After clearing and inserting 10 Xs, vector contains: X, X, X, X, X, X, X, X, X, X

This example demonstrates the following functions:

For more information, see the respective manual pages.

You can store just about anything in a vector: strings, pointers, wide integers, complex structures, etc. If you do want to directly store large objects in a vector, you must remember that the vector type actually copies those objects into and out of the vector each time you insert, push, get, pop and so on. For some large structures, you may want to store a pointer instead (in fact with strings you have no choice: you are always storing a pointer in the vector itself).

Strings are just char *

c2lib doesn't have a fancy string type. Instead we just use plain old char *. This is possible because pools (see below) mean that we don't need to worry about when to copy or deallocate specific objects.

The great benefit of using plain char * for strings is that we can continue to use the familiar libc functions such as strcmp(3), strcpy(3), strlen(3), printf(3) and so on, as in the next example.

   #include <assert.h>
   #include <pstring.h>
   
   char *given_name = "Richard";
5  char *family_name = "Jones";
   char *email_address = "rich@annexia.org";
   
   main ()
   {
10   pool pool = global_pool;
     char *email, *s;
     vector v;
   
     email =
15     psprintf (pool, "%s %s <%s>", given_name, family_name, email_address);
   
     printf ("full email address is: %s\n", email);
   
     v = pstrcsplit (pool, email, ' ');
20 
     printf ("split email into %d components\n", vector_size (v));
   
     vector_get (v, 0, s);
     printf ("first component is: %s\n", s);
25   assert (strcmp (s, given_name) == 0);
   
     vector_get (v, 1, s);
     printf ("second component is: %s\n", s);
     assert (strcmp (s, family_name) == 0);
30 
     vector_get (v, 2, s);
     printf ("third component is: %s\n", s);
     s = pstrdup (pool, s);
     s++;
35   s[strlen(s)-1] = '\0';
     assert (strcmp (s, email_address) == 0);
   }

When run:

full email address is: Richard Jones <rich@annexia.org>
split email into 3 components
first component is: Richard
second component is: Jones
third component is: <rich@annexia.org>

Line 15 demonstrates the psprintf(3) function which is like the ordinary sprintf(3), but is (a) safe, and (b) allocates the string in the pool provided, ensuring that it will be safely deallocated later.

The pstrcsplit(3) function is similar to the Perl split. It takes a string and splits it into a vector of strings, in this case on the space character. There are also other functions for splitting on a string or on a regular expression.

The final part of the code, lines 21-36, prints out the components of the split string. The vector_get(3) function is used to pull the strings out of the vector object.

Notice on line 33 that before we remove the beginning and end < ... > from around the email address, we first duplicate the string using pstrdup(3). In this case it is not strictly necessary to duplicate the string s because we know that pstrcsplit(3) actually allocates new copies of the strings in the vector which it returns. However in general this is good practice because otherwise we would be modifying the contents of the original vector v.

Hashes

Hashes give you all the power of Perl's "%" hashes. In fact the way they work is very similar (but more powerful: unlike Perl's hashes the key does not need to be a string).

In c2lib there are three flavors of hash. However they all work in essentially the same way, and all have exactly the same functionality. The reason for having the three flavors is just to work around an obscure problem with the ANSI C specification!

The three flavors are:

hash A hash of any non-string type to any non-string type.
sash A hash of char * to char *.
shash A hash of char * to any non-string type.

As with vectors, the phrase "any non-string type" can be simple integers or chars, pointers, or complex large structures if you wish.

Here is a short program showing you how to use a sash (but note that the same functions are available for all of the other flavors):

   #include <stdio.h>
   #include <hash.h>
   #include <pstring.h>
   
5  main ()
   {
     pool pool = global_pool;
     sash h = new_sash (pool);
     char *fruit;
10   const char *color;
   
     sash_insert (h, "banana", "yellow");
     sash_insert (h, "orange", "orange");
     sash_insert (h, "apple", "red");
15   sash_insert (h, "kiwi", "green");
     sash_insert (h, "grapefruit", "yellow");
     sash_insert (h, "pear", "green");
     sash_insert (h, "tomato", "red");
     sash_insert (h, "tangerine", "orange");
20 
     for (;;)
       {
         printf ("Please type in the name of a fruit: ");
         fruit = pgetline (pool, stdin, 0);
25 
         if (sash_get (h, fruit, color))
   	printf ("The color of that fruit is %s.\n", color);
         else
   	printf ("Sorry, I don't know anything about that fruit.\n");
30     }
   }

When run:

Please type in the name of a fruit: orange
The color of that fruit is orange.
Please type in the name of a fruit: apple
The color of that fruit is red.
Please type in the name of a fruit: dragon fruit
Sorry, I don't know anything about that fruit.

The sash is allocated on line 8 using the new_sash(3) function.

We populate the sash using the simple sash_insert(3) functions (lines 12-19).

The sash_get(3) function retrieves a value (color) from the sash using the key given (fruit). It returns true if a value was found, or false if there was no matching key.

There are many potentially powerful functions available for manipulating hashes, sashes and shashes (below, * stands for either "h", "s" or "sh"):

Advanced used of pools

So far we have only touched upon pools, and it may not be clear in the examples above why they don't in fact leak memory. There appears to be no deallocation being done, which is quite counter-intuitive to most C programmers!

Pools are collections of related objects (where an "object" is some sort of memory allocation).

In C you are normally responsible for allocating and deallocating every single object, like so:

p = malloc (size);

/* ... use p ... */

free (p);

However in c2lib we first allocate a pool, then use pmalloc(3) and prealloc(3) to allocate lots of related objects in the pool. At the end of the program, all of the objects can be deleted in one go just by calling delete_pool(3).

There is one special pool, called global_pool(3). This pool is created for you before main is called, and it is deleted for you after main returns (or if exit(3) is called). You don't ever need to worry about deallocating global_pool(3) (in fact, if you try to, your program might core dump).

Thus most short programs like the ones above should just allocate all objects in global_pool(3), and never need to worry about deallocating the objects or the pool.

For larger programs, and programs that are expected to run for a long time like servers, you will need to learn about pools.

Pools are organised in a hierarchy. This means that you often allocate one pool inside another pool. Here is a common pattern:

main ()
{
  /* ... use global_pool for allocations here ... */

  for (;;) /* for each request: */
    {
      pool pool = new_subpool (global_pool);

      /* ... process the request using pool ... */

      delete_pool (pool);
    }
}

pool is created as a subpool of global_pool(3) for the duration of the request. At the end of the request the pool (and therefore all objects inside it) is deallocated.

The advantage of creating pool as a subpool of global_pool(3) is that if the request processing code calls exit(3) in the middle of the request, then global_pool(3) will be deallocated in the normal way and as a consequence of this pool will also be properly deallocated.

You can also use new_pool(3) to create a completely new top-level pool. There are some rare circumstances when you will need to do this, but generally you should avoid creating pools which are not subpools. If in doubt, always create subpools of global_pool(3) or of the pool immediately "above" you.

Pools don't just store memory allocations. You can attach other types of objects to pools, or trigger functions which are run when the pool is deallocated. pool_register_fd(3) attaches a file descriptor to a pool, meaning that the file descriptor is closed when the pool is deleted (note however that there is no way to unattach a file descriptor from a pool, so don't go and call close(3) on the file descriptor once you've attached it to a pool. pool_register_cleanup_fn(3) registers your own clean-up function which is called when the pool is deleted. Although you should normally use pmalloc(3) and/or prealloc(3) to allocate objects directly in pools, you can also allocate them normally using malloc(3) and attach them to the pool using pool_register_malloc(3). The object will be freed up automatically when the pool is deallocated.

Pools become very important when writing multi-threaded servers using the pthrlib library. Each thread processes a single request or command. A pool is created for every thread, and is automatically deleted when the thread exits. This assumes of course that threads (and hence requests) are short-lived, which is a reasonable assumption for most HTTP-like services.

Links to manual pages

(These manual pages are not always up to date. For the latest documentation, always consult the manual pages supplied with the latest c2lib package!)

Pools

Vectors

Hashes

Strings and miscellaneous

Matrix and vector math


Richard Jones
Last modified: Fri May 2 19:42:09 BST 2002