pthrlib documentation index

pthrlib is a library for writing small, fast and efficient servers in C. It offers a list of advanced features. This library has been used to write a very tiny and fast web server called rws and a closed source chat server.

The primary aims of pthrlib are:

Tutorial and programming examples

At the heart of pthrlib is a threading library called pseudothreads. This library is a typical lightweight threading library, written from scratch to be as small and fast as possible (it therefore lacks many of the unnecessary features which complicate other lightweight threading libraries, such as the ability to suspend threads).

A small pthrlib server will start off with just a single listener thread, listening for new connections on a socket. When connections come in, a new thread is spun off to handle it:

listener thread
processing thread, connected to client #1
processing thread, connected to client #2
processing thread, connected to client #3
...

More complex pthrlib servers may contain several core threads: for example our closed-source chat server has one extra thread called autoannounce which periodically sends out announcement messages to all clients. They may also use more than one thread per client. Since threads are very lightweight, you should be able to create as many threads as necessary for your application.

Simple echo server

To help you create a server with a listener thread spinning off threads for each incoming connection, there is a helper function called pthr_server_main_loop(3). Almost all programs will want to use it, such as the following simple echo program (I have split the program into chunks for readability).

Standard includes for socket programs, and predeclare static functions:

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>

#include <pool.h>

#include <pthr_pseudothread.h>
#include <pthr_iolib.h>
#include <pthr_server.h>

static void start_processor (int sock, void *data);
static void run (void *);

Recall from the diagram above that we will start one processing thread for each client. The following structure is used to store the per-thread information about that processing thread:

typedef struct processor_thread
{
  pseudothread pth;		/* Pseudothread handle. */
  int sock;			/* Socket. */
} *processor_thread;

main is very simple, since pthr_server_main_loop does all the hard work of opening up a listening socket, forking into the background, parsing command line arguments and so on. Note that we pass a pointer to our start_processor function.

int
main (int argc, char *argv[])
{
  /* Start up the server. */
  pthr_server_main_loop (argc, argv, start_processor);

  exit (0);
}

Whenever a client makes a new connection to our server, the listener thread is going to call start_processor. This creates allocates the per-thread data structure and starts the new thread. The run function is the actual new processing thread running.

static void
start_processor (int sock, void *data)
{
  pool pool;
  processor_thread p;

  pool = new_pool ();
  p = pmalloc (pool, sizeof *p);

  p->sock = sock;
  p->pth = new_pseudothread (pool, run, p, "processor thread");

  pth_start (p->pth);
}

static void
run (void *vp)
{
  processor_thread p = (processor_thread) vp;
  io_handle io;
  char buffer[256];

  io = io_fdopen (p->sock);

  /* Sit in a loop reading strings and echoing them back. */
  while (io_fgets (buffer, sizeof buffer, io, 1))
    io_fputs (buffer, io);

  io_fclose (io);

  pth_exit ();
}

Here is a typical run with this program (what I typed is shown in bold text):

$ ./eg_echo -p 9000
$ telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.localnet (127.0.0.1).
Escape character is '^]'.
hello
hello
goodbye
goodbye
^]

telnet> quit
Connection closed.

Simple HTTP server

Note: Although it is possible to write complete mini webservers using just pthrlib, it is often more flexible and just as fast to use rws's shared object scripts. rws provides you with the complete web serving framework. If you don't use rws and you need to, say, serve an image or a static page at some point in your application, then you will need to either link to another web server like Apache, or else write your own static file service code (it can be done -- we did it for the chat server -- but it's unnecessary).

The following code comes from example 1 supplied with pthrlib. You can find the working code in the examples/ directory. I have omitted some parts of the code in order to concentrate on the interesting and relevant bits.

First of all, the main function:

static void start_processor (int sock, void *);

int
main (int argc, char *argv[])
{
  /* Start up the server. */
  pthr_server_main_loop (argc, argv, start_processor);

  exit (0);
}

static void
start_processor (int sock, void *data)
{
  (void) new_eg1_echo_processor (sock);
}

Again, we are using pthr_server_main_loop to do the hard work. start_processor starts the processor thread. The processor thread's run function has the following outline:

static void
run (void *vp)
{
  eg1_echo_processor p = (eg1_echo_processor) vp;
  int close = 0;
  io_handle io;

  io = io_fdopen (p->sock);

  /* Sit in a loop reading HTTP requests. */
  while (!close)
    {
      /* Parse the HTTP request. */
            :    :    :
            :    :    :

      /* Form the HTTP response. */
            :    :    :
            :    :    :
    }

  io_fclose (io);

  pth_exit ();
}

The purpose of this loop is to deal with HTTP keepalives, where a client (or perhaps many different clients through a proxy) makes a series of requests over the same TCP connection. For each request, we'll make an iteration of the while loop. Each request is independent of the previous one.

At the beginning of the thread, the listening thread hands us a socket file descriptor in sock. Doing I/O directly on a file descriptor is inconvenient, and it can't be wrapped up directly in a stdio FILE * because these block, hanging the entire process (and all other threads). iolib is a replacement for stdio which works with pools and doesn't block. io_fdopen wraps up a file descriptor in a full buffered io_handle.

Now lets look at the step which parses the HTTP request:

  http_request http_request;
  cgi cgi;
  pool pool = pth_get_pool (p->pth);
            :    :    :
            :    :    :

      /* ----- HTTP request ----- */
      http_request = new_http_request (pool, io);
      if (http_request == 0)	/* Normal end of file. */
        break;

      cgi = new_cgi (pool, http_request, io);
      if (cgi == 0)		/* XXX Should send an error here. */
	break;

The new_http_request function parses the HTTP headers. It does pretty much the equivalent of what Apache does just before it hands off to a normal CGI script. You can think of new_cgi as being somewhat equivalent to Perl's CGI.pm.

Here's the code which generates the HTTP response:

  http_response http_response;
            :    :    :
            :    :    :

      http_response = new_http_response (pool, http_request,
					 io,
					 200, "OK");
      http_response_send_header (http_response,
                                 "Content-Type", "text/plain");
      close = http_response_end_headers (http_response);

      if (!http_request_is_HEAD (http_request))
	{
	  io_fprintf (io, "Hello. This is your server.\r\n\r\n");
	  io_fprintf (io, "Your browser sent the following headers:\r\n");

	  headers = http_request_get_headers (http_request);
	  for (i = 0; i < vector_size (headers); ++i)
	    {
	      vector_get (headers, i, header);
	      io_fprintf (io, "\t%s: %s\r\n", header.key, header.value);
	    }

	  io_fprintf (io, "----- end of headers -----\r\n");

	  io_fprintf (io, "The URL was: %s\r\n",
		      http_request_get_url (http_request));
	  io_fprintf (io, "The path component was: %s\r\n",
		      http_request_path (http_request));
	  io_fprintf (io, "The query string was: %s\r\n",
		      http_request_query_string (http_request));
	  io_fprintf (io, "The query arguments were:\r\n");

	  params = cgi_params (cgi);
	  for (i = 0; i < vector_size (params); ++i)
	    {
	      vector_get (params, i, name);
	      value = cgi_param (cgi, name);
	      io_fprintf (io, "\t%s=%s\r\n", name, value);
	    }

	  io_fprintf (io, "----- end of parameters -----\r\n");
	}

new_http_response, http_response_send_header and http_response_end_headers generates the HTTP headers for the response back to the client. We'll see those headers in a minute. Notice that we send back an explicit Content-Type: text/plain header.

The rest of the code actually generates the page. The simplest way to describe it is to show an actual interaction with the server. What I typed is shown in bold text.

$ ./pthr_eg1_echo -p 9000
$ telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /path/here?abc=123&def=456 HTTP/1.0
Host: localhost:9000

HTTP/1.1 200 OK
Content-Type: text/plain
Server: pthrlib-httpd/3.0.3
Date: Fri, 30 Aug 2002 17:04:03 GMT
Connection: close

Hello. This is your server.

Your browser sent the following headers:
        host: localhost:9000
----- end of headers -----
The URL was: /path/here?abc=123&def=456
The path component was: /path/here
The query string was: abc=123&def=456
The query arguments were:
        abc=123
        def=456
----- end of parameters -----
Connection closed by foreign host.

Static file webserver

This following code is from example 2. You can find the complete working program in the examples/ directory. It's a very minimal webserver which can only serve static files from a single directory. If you start the server up as root, then the server will chroot(2) itself into a configurable directory, and change its user ID to nobody.nobody

Again the main function uses pthr_server_main_loop for simplicity. However one thing which pthr_server_main_loop can't do (yet) is set up signal handlers, so we have to do those by hand first:

int
main (int argc, char *argv[])
{
  struct sigaction sa;

  /* Intercept signals. */
  memset (&sa, 0, sizeof sa);
  sa.sa_handler = catch_quit_signal;
  sa.sa_flags = SA_RESTART;
  sigaction (SIGINT, &sa, 0);
  sigaction (SIGQUIT, &sa, 0);
  sigaction (SIGTERM, &sa, 0);

  /* ... but ignore SIGPIPE errors. */
  sa.sa_handler = SIG_IGN;
  sa.sa_flags = SA_RESTART;
  sigaction (SIGPIPE, &sa, 0);

  /* Start up the server. */
  pthr_server_chroot (root);
  pthr_server_username (user);
  pthr_server_main_loop (argc, argv, start_processor);

  exit (0);
}

static void
start_processor (int sock, void *data)
{
  (void) new_eg2_server_processor (sock);
}

static void
catch_quit_signal (int sig)
{
  exit (0);
}

Notice that just before we actually call pthr_server_main_loop, we configure the main loop code first by telling it the root directory (where we want to chroot(2) to) and the username (nobody).

The eg2_server_processor thread structure contains a little more data this time. It contains most of the information about the current request:

struct eg2_server_processor
{
  /* Pseudothread handle. */
  pseudothread pth;

  /* Socket. */
  int sock;

  /* Pool for memory allocations. */
  struct pool *pool;

  /* HTTP request. */
  http_request http_request;

  /* IO handle. */
  io_handle io;
};

The run function has the same basic outline, ie. a while loop to process each request on the same keep-alive connection, and a call to new_http_request to parse the HTTP headers. The outline code is shown in red text below. The code to handle the response is shown in black.

static void
run (void *vp)
{
  eg2_server_processor p = (eg2_server_processor) vp;
  int close = 0;
  const char *path;
  struct stat statbuf;

  p->io = io_fdopen (p->sock);

  /* Sit in a loop reading HTTP requests. */
  while (!close)
    {
      /* ----- HTTP request ----- */
      p->http_request = new_http_request (pool, p->io);
      if (p->http_request == 0)	/* Normal end of file. */
        break;

      /* Get the path and locate the file. */
      path = http_request_path (p->http_request);
      if (stat (path, &statbuf) == -1)
	{
	  close = file_not_found_error (p);
	  continue;
	}

      /* File or directory? */
      if (S_ISDIR (statbuf.st_mode))
	{
	  close = serve_directory (p, path, &statbuf);
	  continue;
	}
      else if (S_ISREG (statbuf.st_mode))
	{
	  close = serve_file (p, path, &statbuf);
	  continue;
	}
      else
	{
	  close = file_not_found_error (p);
	  continue;
	}
    }

  io_fclose (p->io);

  pth_exit ();
}

This is a very simple webserver, so all it does is take the path component of the request, and uses it directly as a filename (note that it relies completely on the chroot(2) environment for security).

Firstly it calls stat to find out if the filename is a directory or a regular file. If it is neither, or if the file doesn't exist, it calls file_not_found_error which sends back a 404 FILE NOT FOUND error.

If the file is a regular file, we call serve_file, which is a simple piece of code:

static int
serve_file (eg2_server_processor p, const char *path,
	    const struct stat *statbuf)
{
  http_response http_response;
  const int n = 4096;
  char *buffer = alloca (n);
  int cl, fd, r;
  char *content_length = pitoa (p->pool, statbuf->st_size);

  fd = open (path, O_RDONLY);
  if (fd < 0)
    return file_not_found_error (p);

  http_response = new_http_response (pool, p->http_request, p->io,
				     200, "OK");
  http_response_send_headers (http_response,
			      /* Content type. */
			      "Content-Type", "text/plain",
			      "Content-Length", content_length,
			      /* End of headers. */
			      NULL);
  cl = http_response_end_headers (http_response);

  if (http_request_is_HEAD (p->http_request)) return cl;

  while ((r = read (fd, buffer, n)) > 0)
    {
      io_fwrite (buffer, r, 1, p->io);
    }

  if (r < 0)
    perror ("read");

  close (fd);

  return cl;
}

Firstly we work out the size of the file, using the statbuf.st_size field. The c2lib function pitoa turns this into a string (all headers must be passed as strings). Next we open the file. If this fails, then the file is inaccessible or has just gone, so we return a 404 instead.

Next we generate our headers:

Content-Type: text/plain
Content-Length: (size of the file in octets)

pthrlib will generate other standard headers as well.

If the request was a HEAD request, then the client only wants to see the headers, so we stop right there. Otherwise we copy the file back to our user.

Party question: Why is it OK to use read(2) when reading the file, but not OK to use write(2) when writing to the socket? Why will this not cause the whole server process to block (on Linux at least)?

Serving a directory is more complicated, so we'll take it in steps. Recall that to serve a directory, we actually need to create an HTML page which lists the files, with information about those files and links to the files themselves.

Firstly if the user requested the directory as:

http://your.hostname/path/to/directory

then we need to redirect them to:

http://your.hostname/path/to/directory/

(note the trailing slash). The reason for this is that relative links within our page won't work otherwise. The browser will request /path/to/file instead of /path/to/directory/file. This is actually a bit of webserver arcana which is often forgotten. If you don't believe me, Apache does this too: go look at the source!

static int
serve_directory (eg2_server_processor p, const char *path,
		 const struct stat *statbuf)
{
  http_response http_response;
  int close;
  DIR *dir;
  struct dirent *d;

  /* If the path doesn't end with a "/", then we need to send
   * a redirect back to the client so it refetches the page
   * with "/" appended.
   */
  if (path[strlen (path)-1] != '/')
    {
      char *location = psprintf (p->pool, "%s/", path);
      return moved_permanently (p, location);
    }

moved_permanently sends back a 301 MOVED PERMANENTLY page causing the browser to re-request the new location.

The next piece of code should be familiar boilerplate. We open the directory, and send back headers. If the request is HEAD we then drop out.

  dir = opendir (path);
  if (dir == 0)
    return file_not_found_error (p);

  http_response = new_http_response (pool, p->http_request, p->io,
				     200, "OK");
  http_response_send_headers (http_response,
			      /* Content type. */
			      "Content-Type", "text/html",
			      NO_CACHE_HEADERS,
			      /* End of headers. */
			      NULL);
  close = http_response_end_headers (http_response);

  if (http_request_is_HEAD (p->http_request)) return close;

The next piece of code is the complicated bit which generates the HTML page listing the files:

  io_fprintf (p->io,
	      "<html><head><title>Directory: %s</title></head>" CRLF
	      "<body bgcolor=\"#ffffff\">" CRLF
	      "<h1>Directory: %s</h1>" CRLF
	      "<table>" CRLF
	      "<tr><td></td><td></td>"
	      "<td><a href=\"..\">Parent directory</a></td></tr>" CRLF,
	      path, path);

  while ((d = readdir (dir)) != 0)
    {
      if (d->d_name[0] != '.')	/* Ignore hidden files. */
	{
	  const char *filename;
	  struct stat fstatbuf;

	  /* Generate the full pathname to this file. */
	  filename = psprintf (p->pool, "%s/%s", path, d->d_name);

	  /* Stat the file to find out what it is. */
	  if (lstat (filename, &fstatbuf) == 0)
	    {
	      const char *type;
	      int size;

	      if (S_ISDIR (fstatbuf.st_mode))
		type = "dir";
	      else if (S_ISREG (fstatbuf.st_mode))
		type = "file";
	      else if (S_ISLNK (fstatbuf.st_mode))
		type = "link";
	      else
		type = "special";

	      size = fstatbuf.st_size;

	      /* Print the details. */
	      io_fprintf (p->io,
			  "<tr><td>[ %s ]</td><td align=right>%d</td>"
			  "<td><a href=\"%s%s\">%s</a>",
			  type, size,
			  d->d_name,
			  S_ISDIR (fstatbuf.st_mode) ? "/" : "",
			  d->d_name);

	      if (S_ISLNK (fstatbuf.st_mode))
		{
		  char link[NAME_MAX+1];
		  int r;

		  r = readlink (filename, link, NAME_MAX);
		  if (r >= 0) link[r] = '\0';
		  else strcpy (link, "unknown");

		  io_fprintf (p->io, " -&gt; %s", link);
		}

	      io_fputs ("</td></tr>" CRLF, p->io);
	    }
	}
    }

  io_fprintf (p->io,
	      "</table></body></html>" CRLF);

  return close;

We first send the top of the HTML page, and the beginning of the table (the whole page is one large table, of course).

Next we loop over the directory entries using readdir(3) to read each one. Ignoring files which start with a dot (.) we lstat(2) each file to find out if it's a directory, file or symbolic link, or some type of special device node.

Depending on the file type, we generate a different bit of HTML containing a relative link to the file or directory (if it's a directory we need to remember to append a trailing slash to the name to avoid that extra 301 redirect).

Finally after we reach the end of the directory we finish of the table and the page and return.

Further examples

That's the end of this pthrlib tutorial, I hope you enjoyed it.

pthrlib isn't just about writing web servers. You can use it to write all sorts of types of servers, or even clients (it has an FTP client library which I used to load-test Net::FTPServer).

If, however, you feel like using pthrlib to write a web server, I strongly urge you to use rws and shared object scripts. These are described in the rws documentation. (rws uses pthrlib).

Links to manual pages

(These manual pages are not always up to date. For the latest documentation, always consult the manual pages supplied with the latest pthrlib package!)

Pseudothreads

Server main loop

Buffered I/O library

HTTP server library

CGI library

Thread synchronisation (mutexes, R/W-locks, wait queues)

FTP client library


Richard Jones
Last modified: Sun Dec 1 14:44:00 GMT 2002