pthrlib
is a library for writing small, fast
and efficient servers in C. It offers a list of advanced
features. This library has been used to write a
very
tiny and fast web server called rws and a closed
source chat server.
The primary aims of pthrlib
are:
pthrlib
server will outperform any other
server architecture or language (on non-SMP).
c2lib
removes many risks of
buffer overflows.
At the heart of pthrlib
is a threading
library called pseudothreads
. This library
is a typical lightweight threading library, written
from scratch to be as small and fast as possible (it
therefore lacks many of the unnecessary features
which complicate other lightweight threading libraries,
such as the ability to suspend threads).
A small pthrlib
server will start off
with just a single listener thread, listening for
new connections on a socket. When connections come
in, a new thread is spun off to handle it:
listener thread |
processing thread, connected to client #1 |
processing thread, connected to client #2 |
processing thread, connected to client #3 |
... |
More complex pthrlib
servers may contain
several core threads: for example our closed-source
chat server has one extra thread called autoannounce
which periodically sends out announcement messages to
all clients. They may also use more than one thread
per client. Since threads are very lightweight, you
should be able to create as many threads as necessary
for your application.
echoserver
To help you create a server with a listener thread
spinning off threads for each incoming connection,
there is a helper function called pthr_server_main_loop(3)
.
Almost all programs will want to use it, such as the
following simple echo
program (I have split
the program into chunks for readability).
Standard includes for socket programs, and predeclare static functions:
#include <stdio.h> #include <stdlib.h> #include <sys/socket.h> #include <pool.h> #include <pthr_pseudothread.h> #include <pthr_iolib.h> #include <pthr_server.h> static void start_processor (int sock, void *data); static void run (void *);
Recall from the diagram above that we will start one processing thread for each client. The following structure is used to store the per-thread information about that processing thread:
typedef struct processor_thread { pseudothread pth; /* Pseudothread handle. */ int sock; /* Socket. */ } *processor_thread;
main
is very simple, since
pthr_server_main_loop
does all the hard work
of opening up a listening socket, forking into the
background, parsing command line arguments and so on.
Note that we pass a pointer to our
start_processor
function.
int main (int argc, char *argv[]) { /* Start up the server. */ pthr_server_main_loop (argc, argv, start_processor); exit (0); }
Whenever a client makes a new connection to our server,
the listener thread is going to call start_processor
.
This creates allocates the per-thread data structure
and starts the new thread. The run
function is the actual new processing thread running.
static void start_processor (int sock, void *data) { pool pool; processor_thread p; pool = new_pool (); p = pmalloc (pool, sizeof *p); p->sock = sock; p->pth = new_pseudothread (pool, run, p, "processor thread"); pth_start (p->pth); } static void run (void *vp) { processor_thread p = (processor_thread) vp; io_handle io; char buffer[256]; io = io_fdopen (p->sock); /* Sit in a loop reading strings and echoing them back. */ while (io_fgets (buffer, sizeof buffer, io, 1)) io_fputs (buffer, io); io_fclose (io); pth_exit (); }
Here is a typical run with this program (what I typed is shown in bold text):
$ ./eg_echo -p 9000 $ telnet localhost 9000 Trying 127.0.0.1... Connected to localhost.localnet (127.0.0.1). Escape character is '^]'. hello hello goodbye goodbye ^] telnet> quit Connection closed.
Note: Although it is possible to write complete
mini webservers using just pthrlib
, it is
often more flexible and just as fast to use
rws's
shared object scripts. rws
provides you
with the complete web serving framework. If you don't
use rws
and you need to, say, serve an
image or a static page at some point in your application,
then you will need to either link to another web server
like Apache, or else write your own static file service
code (it can be done -- we did it for the chat server --
but it's unnecessary).
The following code comes from example 1 supplied with
pthrlib
. You can find the working code
in the examples/
directory. I have omitted
some parts of the code in order to concentrate on the
interesting and relevant bits.
First of all, the main
function:
static void start_processor (int sock, void *); int main (int argc, char *argv[]) { /* Start up the server. */ pthr_server_main_loop (argc, argv, start_processor); exit (0); } static void start_processor (int sock, void *data) { (void) new_eg1_echo_processor (sock); }
Again, we are using pthr_server_main_loop
to
do the hard work. start_processor
starts the
processor thread. The processor thread's run
function has the following outline:
static void run (void *vp) { eg1_echo_processor p = (eg1_echo_processor) vp; int close = 0; io_handle io; io = io_fdopen (p->sock); /* Sit in a loop reading HTTP requests. */ while (!close) { /* Parse the HTTP request. */ : : : : : : /* Form the HTTP response. */ : : : : : : } io_fclose (io); pth_exit (); }
The purpose of this loop is to deal with HTTP keepalives,
where a client (or perhaps many different clients through a
proxy) makes a series of requests over the same TCP connection.
For each request, we'll make an iteration of the while
loop. Each request is independent of the previous one.
At the beginning of the thread, the listening thread hands us
a socket file descriptor in sock
. Doing I/O directly
on a file descriptor is inconvenient, and it can't be
wrapped up directly in a stdio
FILE *
because these block, hanging the entire process (and all other
threads). iolib
is a replacement for stdio
which works with pools and doesn't block. io_fdopen
wraps up a file descriptor in a full buffered io_handle
.
Now lets look at the step which parses the HTTP request:
http_request http_request; cgi cgi; pool pool = pth_get_pool (p->pth); : : : : : : /* ----- HTTP request ----- */ http_request = new_http_request (pool, io); if (http_request == 0) /* Normal end of file. */ break; cgi = new_cgi (pool, http_request, io); if (cgi == 0) /* XXX Should send an error here. */ break;
The new_http_request
function parses the
HTTP headers. It does pretty much the equivalent of what
Apache does just before it hands off to a normal CGI script.
You can think of new_cgi
as being somewhat
equivalent to Perl's CGI.pm
.
Here's the code which generates the HTTP response:
http_response http_response; : : : : : : http_response = new_http_response (pool, http_request, io, 200, "OK"); http_response_send_header (http_response, "Content-Type", "text/plain"); close = http_response_end_headers (http_response); if (!http_request_is_HEAD (http_request)) { io_fprintf (io, "Hello. This is your server.\r\n\r\n"); io_fprintf (io, "Your browser sent the following headers:\r\n"); headers = http_request_get_headers (http_request); for (i = 0; i < vector_size (headers); ++i) { vector_get (headers, i, header); io_fprintf (io, "\t%s: %s\r\n", header.key, header.value); } io_fprintf (io, "----- end of headers -----\r\n"); io_fprintf (io, "The URL was: %s\r\n", http_request_get_url (http_request)); io_fprintf (io, "The path component was: %s\r\n", http_request_path (http_request)); io_fprintf (io, "The query string was: %s\r\n", http_request_query_string (http_request)); io_fprintf (io, "The query arguments were:\r\n"); params = cgi_params (cgi); for (i = 0; i < vector_size (params); ++i) { vector_get (params, i, name); value = cgi_param (cgi, name); io_fprintf (io, "\t%s=%s\r\n", name, value); } io_fprintf (io, "----- end of parameters -----\r\n"); }
new_http_response
,
http_response_send_header
and
http_response_end_headers
generates the
HTTP headers for the response back to the client. We'll
see those headers in a minute.
Notice that we send back an explicit
Content-Type: text/plain
header.
The rest of the code actually generates the page. The simplest way to describe it is to show an actual interaction with the server. What I typed is shown in bold text.
$ ./pthr_eg1_echo -p 9000 $ telnet localhost 9000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET /path/here?abc=123&def=456 HTTP/1.0 Host: localhost:9000 HTTP/1.1 200 OK Content-Type: text/plain Server: pthrlib-httpd/3.0.3 Date: Fri, 30 Aug 2002 17:04:03 GMT Connection: close Hello. This is your server. Your browser sent the following headers: host: localhost:9000 ----- end of headers ----- The URL was: /path/here?abc=123&def=456 The path component was: /path/here The query string was: abc=123&def=456 The query arguments were: abc=123 def=456 ----- end of parameters ----- Connection closed by foreign host.
This following code is from example 2. You can find
the complete working program in the examples/
directory. It's a very minimal webserver which can
only serve static files from a single directory. If
you start the server up as root
, then
the server will chroot(2)
itself into
a configurable directory, and change its user ID to
nobody.nobody
Again the main
function uses
pthr_server_main_loop
for simplicity.
However one thing which pthr_server_main_loop
can't do (yet) is set up signal handlers, so we have
to do those by hand first:
int main (int argc, char *argv[]) { struct sigaction sa; /* Intercept signals. */ memset (&sa, 0, sizeof sa); sa.sa_handler = catch_quit_signal; sa.sa_flags = SA_RESTART; sigaction (SIGINT, &sa, 0); sigaction (SIGQUIT, &sa, 0); sigaction (SIGTERM, &sa, 0); /* ... but ignore SIGPIPE errors. */ sa.sa_handler = SIG_IGN; sa.sa_flags = SA_RESTART; sigaction (SIGPIPE, &sa, 0); /* Start up the server. */ pthr_server_chroot (root); pthr_server_username (user); pthr_server_main_loop (argc, argv, start_processor); exit (0); } static void start_processor (int sock, void *data) { (void) new_eg2_server_processor (sock); } static void catch_quit_signal (int sig) { exit (0); }
Notice that just before we actually call
pthr_server_main_loop
, we configure
the main loop code first by telling it the
root directory (where we want to chroot(2)
to) and the username (nobody
).
The eg2_server_processor
thread
structure contains a little more data this time. It
contains most of the information about the current
request:
struct eg2_server_processor { /* Pseudothread handle. */ pseudothread pth; /* Socket. */ int sock; /* Pool for memory allocations. */ struct pool *pool; /* HTTP request. */ http_request http_request; /* IO handle. */ io_handle io; };
The run
function has the same basic outline,
ie. a while
loop to process each request on
the same keep-alive connection, and a call to
new_http_request
to parse the HTTP headers. The
outline code is shown in red text
below. The code to handle the response is shown in
black.
static void run (void *vp) { eg2_server_processor p = (eg2_server_processor) vp; int close = 0; const char *path; struct stat statbuf; p->io = io_fdopen (p->sock); /* Sit in a loop reading HTTP requests. */ while (!close) { /* ----- HTTP request ----- */ p->http_request = new_http_request (pool, p->io); if (p->http_request == 0) /* Normal end of file. */ break; /* Get the path and locate the file. */ path = http_request_path (p->http_request); if (stat (path, &statbuf) == -1) { close = file_not_found_error (p); continue; } /* File or directory? */ if (S_ISDIR (statbuf.st_mode)) { close = serve_directory (p, path, &statbuf); continue; } else if (S_ISREG (statbuf.st_mode)) { close = serve_file (p, path, &statbuf); continue; } else { close = file_not_found_error (p); continue; } } io_fclose (p->io); pth_exit (); }
This is a very simple webserver, so all it does is take the
path
component of the request, and uses it directly
as a filename (note that it relies completely on the
chroot(2)
environment for security).
Firstly it calls stat
to find out if the filename
is a directory or a regular file. If it is neither, or if the
file doesn't exist, it calls file_not_found_error
which sends back a 404 FILE NOT FOUND error.
If the file is a regular file, we call serve_file
,
which is a simple piece of code:
static int serve_file (eg2_server_processor p, const char *path, const struct stat *statbuf) { http_response http_response; const int n = 4096; char *buffer = alloca (n); int cl, fd, r; char *content_length = pitoa (p->pool, statbuf->st_size); fd = open (path, O_RDONLY); if (fd < 0) return file_not_found_error (p); http_response = new_http_response (pool, p->http_request, p->io, 200, "OK"); http_response_send_headers (http_response, /* Content type. */ "Content-Type", "text/plain", "Content-Length", content_length, /* End of headers. */ NULL); cl = http_response_end_headers (http_response); if (http_request_is_HEAD (p->http_request)) return cl; while ((r = read (fd, buffer, n)) > 0) { io_fwrite (buffer, r, 1, p->io); } if (r < 0) perror ("read"); close (fd); return cl; }
Firstly we work out the size of the file, using the
statbuf.st_size
field. The
c2lib
function pitoa
turns this into a string (all
headers must be passed as strings). Next we open the
file. If this fails, then the file is inaccessible or
has just gone, so we return a 404 instead.
Next we generate our headers:
Content-Type: text/plain Content-Length: (size of the file in octets)
pthrlib
will generate other standard
headers as well.
If the request was a HEAD
request, then
the client only wants to see the headers, so we stop
right there. Otherwise we copy the file back to our
user.
Party question: Why is it OK to use read(2)
when reading the file, but not OK to use write(2)
when writing to the socket? Why will this not cause
the whole server process to block (on Linux at least)?
Serving a directory is more complicated, so we'll take it in steps. Recall that to serve a directory, we actually need to create an HTML page which lists the files, with information about those files and links to the files themselves.
Firstly if the user requested the directory as:
http://your.hostname/path/to/directory
then we need to redirect them to:
http://your.hostname/path/to/directory/
(note the trailing slash). The reason for this is that
relative links within our page won't work otherwise. The
browser will request /path/to/file
instead of
/path/to/directory/file
. This is actually a
bit of webserver arcana which is often forgotten. If you
don't believe me, Apache does this too: go look at the source!
static int serve_directory (eg2_server_processor p, const char *path, const struct stat *statbuf) { http_response http_response; int close; DIR *dir; struct dirent *d; /* If the path doesn't end with a "/", then we need to send * a redirect back to the client so it refetches the page * with "/" appended. */ if (path[strlen (path)-1] != '/') { char *location = psprintf (p->pool, "%s/", path); return moved_permanently (p, location); }
moved_permanently
sends back a 301 MOVED PERMANENTLY
page causing the browser to re-request the new location.
The next piece of code should be familiar boilerplate. We open
the directory, and send back headers. If the request is
HEAD
we then drop out.
dir = opendir (path); if (dir == 0) return file_not_found_error (p); http_response = new_http_response (pool, p->http_request, p->io, 200, "OK"); http_response_send_headers (http_response, /* Content type. */ "Content-Type", "text/html", NO_CACHE_HEADERS, /* End of headers. */ NULL); close = http_response_end_headers (http_response); if (http_request_is_HEAD (p->http_request)) return close;
The next piece of code is the complicated bit which generates the HTML page listing the files:
io_fprintf (p->io, "<html><head><title>Directory: %s</title></head>" CRLF "<body bgcolor=\"#ffffff\">" CRLF "<h1>Directory: %s</h1>" CRLF "<table>" CRLF "<tr><td></td><td></td>" "<td><a href=\"..\">Parent directory</a></td></tr>" CRLF, path, path); while ((d = readdir (dir)) != 0) { if (d->d_name[0] != '.') /* Ignore hidden files. */ { const char *filename; struct stat fstatbuf; /* Generate the full pathname to this file. */ filename = psprintf (p->pool, "%s/%s", path, d->d_name); /* Stat the file to find out what it is. */ if (lstat (filename, &fstatbuf) == 0) { const char *type; int size; if (S_ISDIR (fstatbuf.st_mode)) type = "dir"; else if (S_ISREG (fstatbuf.st_mode)) type = "file"; else if (S_ISLNK (fstatbuf.st_mode)) type = "link"; else type = "special"; size = fstatbuf.st_size; /* Print the details. */ io_fprintf (p->io, "<tr><td>[ %s ]</td><td align=right>%d</td>" "<td><a href=\"%s%s\">%s</a>", type, size, d->d_name, S_ISDIR (fstatbuf.st_mode) ? "/" : "", d->d_name); if (S_ISLNK (fstatbuf.st_mode)) { char link[NAME_MAX+1]; int r; r = readlink (filename, link, NAME_MAX); if (r >= 0) link[r] = '\0'; else strcpy (link, "unknown"); io_fprintf (p->io, " -> %s", link); } io_fputs ("</td></tr>" CRLF, p->io); } } } io_fprintf (p->io, "</table></body></html>" CRLF); return close;
We first send the top of the HTML page, and the beginning of the table (the whole page is one large table, of course).
Next we loop over the directory entries using readdir(3)
to read each one. Ignoring files which start with a dot (.) we
lstat(2)
each file to find out if it's a directory,
file or symbolic link, or some type of special device node.
Depending on the file type, we generate a different bit of HTML containing a relative link to the file or directory (if it's a directory we need to remember to append a trailing slash to the name to avoid that extra 301 redirect).
Finally after we reach the end of the directory we finish of the table and the page and return.
That's the end of this pthrlib
tutorial, I
hope you enjoyed it.
pthrlib
isn't just about writing web servers.
You can use it to write all sorts of types of servers,
or even clients (it has an FTP client library which I
used to load-test Net::FTPServer
).
If, however, you feel like using pthrlib
to
write a web server, I strongly urge you to use
rws
and shared object scripts. These are described in
the rws
documentation. (rws uses pthrlib
).
(These manual pages are not always up to date. For the
latest documentation, always consult the manual pages
supplied with the latest pthrlib
package!)
new_pseudothread(3)
pseudothread_count_threads(3)
pseudothread_get_stack_size(3)
pseudothread_get_threads(3)
pseudothread_set_stack_size(3)
pth_accept(3)
pth_catch(3)
pth_connect(3)
pth_die(3)
pth_exit(3)
pth_get_data(3)
pth_get_language(3)
pth_get_name(3)
pth_get_PC(3)
pth_get_pool(3)
pth_get_run(3)
pth_get_SP(3)
pth_get_stack(3)
pth_get_stack_size(3)
pth_get_thread_num(3)
pth_get_tz(3)
pth_millisleep(3)
pth_nanosleep(3)
pth_poll(3)
pth_read(3)
pth_recv(3)
pth_recvfrom(3)
pth_recvmsg(3)
pth_select(3)
pth_send(3)
pth_sendmsg(3)
pth_sendto(3)
pth_set_language(3)
pth_set_name(3)
pth_set_tz(3)
pth_sleep(3)
pth_start(3)
pth_timeout(3)
pth_write(3)
pthr_server_main_loop(3)
pthr_server_default_port(3)
pthr_server_port_option_name(3)
pthr_server_disable_syslog(3)
pthr_server_package_name(3)
pthr_server_disable_fork(3)
pthr_server_disable_chdir(3)
pthr_server_disable_close(3)
pthr_server_chroot(3)
pthr_server_username(3)
pthr_server_stderr_file(3)
pthr_server_startup_fn(3)
io_copy(3)
io_fclose(3)
io_fdopen(3)
io_fflush(3)
io_fgetc(3)
io_fgets(3)
io_fileno(3)
io_fprintf(3)
io_fputc(3)
io_fputs(3)
io_fread(3)
io_fwrite(3)
io_get_inbufcount(3)
io_get_outbufcount(3)
io_pclose(3)
io_popen(3)
io_setbufmode(3)
io_ungetc(3)
http_get_log_file(3)
http_get_servername(3)
http_request_get_header(3)
http_request_get_headers(3)
http_request_is_HEAD(3)
http_request_method(3)
http_request_method_string(3)
http_request_nr_headers(3)
http_request_path(3)
http_request_query_string(3)
http_request_time(3)
http_request_url(3)
http_request_version(3)
http_response_end_headers(3)
http_response_send_header(3)
http_response_send_headers(3)
http_set_log_file(3)
http_set_servername(3)
new_http_request(3)
new_http_response(3)
cgi_erase(3)
cgi_escape(3)
cgi_get_post_max(3)
cgi_param(3)
cgi_param_list(3)
cgi_params(3)
cgi_set_post_max(3)
cgi_unescape(3)
copy_cgi(3)
new_cgi(3)
mutex_enter(3)
mutex_leave(3)
mutex_try_enter(3)
new_mutex(3)
new_rwlock(3)
new_wait_queue(3)
rwlock_enter_read(3)
rwlock_enter_write(3)
rwlock_leave(3)
rwlock_readers_have_priority(3)
rwlock_try_enter_read(3)
rwlock_try_enter_write(3)
rwlock_writers_have_priority(3)
wq_nr_sleepers(3)
wq_sleep_on(3)
wq_wake_up(3)
wq_wake_up_one(3)
ftpc_ascii(3)
ftpc_binary(3)
ftpc_cdup(3)
ftpc_cwd(3)
ftpc_delete(3)
ftpc_dir(3)
ftpc_get(3)
ftpc_login(3)
ftpc_ls(3)
ftpc_mkdir(3)
ftpc_put(3)
ftpc_pwd(3)
ftpc_quit(3)
ftpc_quote(3)
ftpc_rmdir(3)
ftpc_set_passive_mode(3)
ftpc_type(3)
new_ftpc(3)