doc/index.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
   2 <html>
   3   <head>
   4     <title>rws documentation index</title>
   5     <style type="text/css"><!--
   6       h1 {
   7       text-align: center;
   8       }
   9       pre {
  10       background-color: #eeeeff;
  11       }
  12       code {
  13       color: green;
  14       font-weight: bold;
  15       }
  16       --></style>
  17   </head>
  18
  19   <body bgcolor="#ffffff">
  20     <h1>rws documentation index</h1>
  21
  22     <h2>Shared object scripts</h2>
  23
  24     <p>
  25       Shared object scripts are a possibly unique feature of <code>rws</code>.
  26       A shared object script is a CGI script, written in C, which
  27       is loaded into the address space of the server at runtime.
  28       Thus shared object scripts are very fast because they are
  29       written in C, loaded just once, and able to run without
  30       needing a <code>fork(2)</code>.
  31     </p>
  32
  33     <p>
  34       On the other hand, the penalty for speed is security, although
  35       competent C programmers who are using all the features of
  36       <a href="http://www.annexia.org/freeware/c2lib/">c2lib</a> and
  37       <a href="http://www.annexia.org/freeware/pthrlib/">pthrlib</a>
  38       should be able to write code which is free of buffer overflows
  39       and some other common security issues. (However if you allow
  40       your server to run shared object scripts from untrusted
  41       third parties, then you have essentially no security at all, since
  42       shared object scripts can interfere with the internal workings
  43       of the webserver in arbitrary ways).
  44     </p>
  45
  46     <h3>The anatomy of a shared object script</h3>
  47
  48     <p>
  49       A shared object script is a <q><code>.so</code></q>
  50       file (in other words, a shared library or <q>DLL</q>).
  51       It should contain a single external symbol called
  52       <code>handle_request</code>, prototyped as:
  53     </p>
  54
  55 <pre>
  56 int handle_request (rws_request rq);
  57 </pre>
  58
  59     <p>
  60       The <code>rws_request</code> object is defined in
  61       <code>rws_request.h</code>.
  62     </p>
  63
  64     <p>
  65       The first time that any client requests the shared
  66       object script, <code>rws</code> calls <code>dlopen(3)</code>
  67       on the file. As noted in the <code>dlopen(3)</code>
  68       manual page, this will cause <code>_init</code> and any
  69       constructor functions in the file to be run.
  70       Then <code>rws</code> creates the <code>rws_request</code>
  71       object (see below) and calls <code>handle_request</code>.
  72       The shared object script remains loaded in memory
  73       after <code>handle_request</code> has returned, ready
  74       for the next invocation.
  75     </p>
  76
  77     <p>
  78       On subsequent invocations, <code>dlopen(3)</code> is
  79       <em>not</em> called, so constructors only run once.
  80     </p>
  81
  82     <p>
  83       However, on each invocation, <code>rws</code> checks the
  84       modification time of the file on disk, and if it has
  85       changed, then it will attempt to reload the file. To
  86       do this, it calls <code>dlclose(3)</code> first, which
  87       will cause <code>_fini</code> and destructors in the
  88       library to run, and unloads the library from memory. It
  89       then reopens (<code>dlopen(3)</code>) the new file on
  90       disk, as above. Beware that there are some occasions when
  91       <code>rws</code> actually cannot reload a shared object
  92       script, even though it notices that the file has changed
  93       on disk. <code>rws</code> keeps a use count of the number
  94       of threads currently using the shared object script, and
  95       for safety reasons it cannot reload the file until this
  96       usage count drops to zero. This means that in some cases
  97       (eg. under very heavy load) a shared object script might
  98       never be reloaded, even if it changes on disk.
  99     </p>
 100
 101     <h3>Configuring rws to recognise shared object scripts</h3>
 102
 103     <p>
 104       <code>rws</code> will not try to run shared object scripts
 105       unless the <code>exec so</code> flag has been set on the
 106       alias, and the shared object script itself is executable (mode 0755).
 107       Here is an example shared object scripts directory:
 108     </p>
 109
 110 <pre>
 111 alias /so-bin/
 112         path: /usr/share/rws/so-bin
 113         exec so: 1
 114 end alias
 115 </pre>
 116
 117     <p>
 118       Make sure that the <code>so-bin</code> directory is only
 119       writable by trusted users, and make sure each shared object
 120       script is executable, mode 0755.
 121     </p>
 122
 123     <p>
 124       If you can't make your shared object scripts run, then here
 125       is a checklist before you email me:
 126     </p>
 127
 128     <ul>
 129       <li> Make sure you have put the above alias section into
 130         the correct host file.
 131       <li> <code>exec so</code> option is set?
 132       <li> Restarted <code>rwsd</code>?
 133       <li> Directory is world readable, executable (mode 0755)?
 134       <li> Shared object script is world readable, executable (mode 0755)?
 135       <li> Any unresolved symbols (<code>ldd -r script.so</code>), apart
 136         from the <code>rws_request_*</code> symbols which will be resolved
 137         when the library is loaded into <code>rws</code>?
 138       <li> Missing <code>handle_request</code> function?
 139       <li> <code>handle_request</code> is exported in the dynamic
 140         symbol table (<code>nm -D script.so</code>)?
 141       <li> Check the contents of your error_log file to see
 142         if any error messages were reported.
 143     </ul>
 144
 145     <p>
 146       I have quite successfully used <code>gdb</code> on a running
 147       server to debug and diagnose problems in shared object
 148       scripts. However note that by default <code>gdb</code> may
 149       have trouble loading the symbol table for your script. Use
 150       the <code>sharedlibrary script.so</code>
 151       command to load symbols instead.
 152     </p>
 153
 154     <h3>Shared object scripts vs. Monolith applications</h3>
 155
 156     <p>
 157       If you've been looking at the
 158       <a href="http://www.annexia.org/freeware/monolith/">Monolith
 159         application framework</a> pages, then you may be confused
 160       about how shared object scripts relate to Monolith.
 161     </p>
 162
 163     <p>
 164       Shared object scripts are the direct analogy to CGI scripts,
 165       the only difference being that CGI scripts are usually written
 166       in very high level languages like Perl and PHP, and shared
 167       object scripts are loaded into the server process for efficiency.
 168       (Perl CGI scripts can also be loaded into the Apache
 169       server process using <code>mod_perl</code>, and this is done
 170       for similar reasons of efficiency).
 171     </p>
 172
 173     <p>
 174       Monolith programs are entire applications, the sort of
 175       thing which normally would be written using dozens of
 176       cooperating CGI scripts. In the case of Monolith, however,
 177       the entire application compiles down to a single <code>.so</code>
 178       file which happens to be (you guessed it) a shared object script.
 179     </p>
 180
 181     <p>
 182       Imagine that you are going to write yet another web-based email
 183       client. For some reason you want to write this in C (please
 184       don't try this at home: I wrote one in Perl at my last job and
 185       that was hard enough). Here are three possible approaches
 186       using C and <code>rws</code>:
 187     </p>
 188
 189     <ol>
 190       <li>
 191         <p>
 192           Write forty or so shared object scripts. Each displays
 193           a single frame of the application, one might generate
 194           the frameset, a couple of dozen to implement specific
 195           operations like emptying trash or moving a message between
 196           folders.
 197         </p>
 198         <p>
 199           This is very much the normal way of writing CGI-based
 200           applications.
 201         </p>
 202       <li> Write a Monolith application. This will probably be
 203         in lots of C files, but will compile down and be linked
 204         into a single <code>.so</code> file (eg. <code>email.so</code>)
 205         which is dropped into the <code>so-bin</code> directory.
 206       <li>
 207         <p>
 208           Write a Monolith email super-widget. This is going
 209           to exist in a shared library called
 210           <code>/usr/lib/libmyemail.so</code>
 211           with a corresponding header file defining the interface
 212           called <code>myemail.h</code>.
 213         </p>
 214         <p>
 215           Write a tiny Monolith application which just instantiates
 216           a window and an email widget, and embeds the email widget
 217           in the window. This will compile into <code>email.so</code>
 218           (it'll be very tiny) which is dropped into <code>so-bin</code>.
 219         </p>
 220         <p>
 221           The advantage of this final approach is that you can
 222           reuse the email widget in other places, or indeed sell
 223           it to other Monolith users.
 224         </p>
 225     </ol>
 226
 227     <p>
 228       So Monolith is good when you want to build applications
 229       from widgets as you would if you were building a
 230       Java/Swing, Windows MFC, gtk, Tcl/Tk graphical application.
 231       It's also good if code re-use is important to you.
 232       Shared object scripts are good when you are familiar with
 233       CGI-based techniques to build websites.
 234     </p>
 235
 236     <p>
 237       Of course, the same <code>rws</code> server can serve
 238       shared object scripts, multiple Monolith applications,
 239       flat files, and directory listings, all at the same time.
 240     </p>
 241
 242     <h3>Tutorial on writing shared object scripts</h3>
 243
 244     <p>
 245       In this tutorial I will explain how the two shared object
 246       script examples supplied with <code>rws</code> work. You
 247       will also need to have read the tutorials for
 248       <a href="http://www.annexia.org/freeware/c2lib/">c2lib</a> and
 249       <a href="http://www.annexia.org/freeware/pthrlib/">pthrlib</a>
 250       which you can find by going to their respective web pages.
 251     </p>
 252
 253     <p>
 254       The first example, <code>hello.c</code> is very simple indeed.
 255       It's just a "hello world" program. The program starts by
 256       including <code>rws_request.h</code>:
 257     </p>
 258
 259 <pre>
 260 #include &lt;rws_request.h&gt;
 261 </pre>
 262
 263     <p>
 264       Following this is the <code>handle_request</code>
 265       function. This is the function which <code>rws</code>
 266       will call every time a user requests the script:
 267     </p>
 268
 269 <pre>
 270 int
 271 handle_request (rws_request rq)
 272 {
 273   pseudothread pth = rws_request_pth (rq);
 274   http_request http_request = rws_request_http_request (rq);
 275   io_handle io = rws_request_io (rq);
 276
 277   int close;
 278   http_response http_response;
 279
 280   /* Begin response. */
 281   http_response = new_http_response (pth, http_request, io,
 282                                      200, "OK");
 283   http_response_send_headers (http_response,
 284                               /* Content type. */
 285                               "Content-Type", "text/plain",
 286                               /* End of headers. */
 287                               NULL);
 288   close = http_response_end_headers (http_response);
 289
 290   if (http_request_is_HEAD (http_request)) return close;
 291
 292   io_fprintf (io, "hello, world!");
 293
 294   return close;
 295 }
 296 </pre>
 297
 298     <p>
 299       We first extract some fields from the <code>rws_request</code>
 300       object. <code>rws</code> has already taken the time to
 301       parse the HTTP headers from the client, but we need to
 302       generate the reply headers (shared object scripts
 303       are always "nph" -- no parsed headers). The
 304       <code>pthrlib</code> functions
 305       <code>new_http_response</code>,
 306       <code>http_response_send_headers</code> and
 307       <code>http_response_end_headers</code> do this. Note
 308       that we send a <code>Content-Type: text/plain</code>
 309       header. You must always generate a correct
 310       <code>Content-Type</code> header.
 311     </p>
 312
 313     <p>
 314       If the original request was a <code>HEAD</code> request, then
 315       the client only wants to see the headers, so we stop here.
 316     </p>
 317
 318     <p>
 319       Otherwise we generate our message and return.
 320     </p>
 321
 322     <p>
 323       NB. Don't call <code>io_fclose</code> on the I/O handle! If you
 324       really want to force the connection to close, set the
 325       <code>close</code> variable to 1 and return it. This is
 326       because the client (or proxy) might be issuing several
 327       separate HTTP requests over the same kept-alive TCP connection.
 328     </p>
 329
 330     <p>
 331       The second example, <code>show_params.c</code>, is just slightly
 332       more complex, but demonstrates how to do parameter parsing.
 333       After reading this you should have enough knowledge to
 334       go away and write your own shared object scripts that
 335       actually do useful stuff.
 336     </p>
 337
 338     <p>
 339       As before, we start by including a few useful headers:
 340     </p>
 341
 342 <pre>
 343 #include &lt;pool.h&gt;
 344 #include &lt;vector.h&gt;
 345 #include &lt;pthr_cgi.h&gt;
 346
 347 #include &lt;rws_request.h&gt;
 348 </pre>
 349
 350     <p>
 351       The <code>handle_request</code> function starts the same way
 352       as before:
 353     </p>
 354
 355 <pre>
 356 int
 357 handle_request (rws_request rq)
 358 {
 359   pool pool = rws_request_pool (rq);
 360   pseudothread pth = rws_request_pth (rq);
 361   http_request http_request = rws_request_http_request (rq);
 362   io_handle io = rws_request_io (rq);
 363 </pre>
 364
 365     <p>
 366       Then we define some variables that we're going to use:
 367     </p>
 368
 369 <pre>
 370   cgi cgi;
 371   int close, i;
 372   http_response http_response;
 373   vector headers, params;
 374   struct http_header header;
 375   const char *name, *value;
 376 </pre>
 377
 378     <p>
 379       The actual job of parsing out the CGI parameters is simplified
 380       because <code>pthrlib</code> contains a CGI library
 381       (similar to Perl's <code>CGI.pm</code>):
 382     </p>
 383
 384 <pre>
 385   /* Parse CGI parameters. */
 386   cgi = new_cgi (pool, http_request, io);
 387 </pre>
 388
 389     <p>
 390       The response phase begins by sending the HTTP
 391       headers as before:
 392     </p>
 393
 394 <pre>
 395   /* Begin response. */
 396   http_response = new_http_response (pth, http_request, io,
 397                                      200, "OK");
 398   http_response_send_headers (http_response,
 399                               /* Content type. */
 400                               "Content-Type", "text/plain",
 401                               /* End of headers. */
 402                               NULL);
 403   close = http_response_end_headers (http_response);
 404
 405   if (http_request_is_HEAD (http_request)) return close;
 406 </pre>
 407
 408     <p>
 409       Now we print out the actual contents of both the
 410       <code>http_request</code> object and the <code>cgi</code>
 411       object. HTTP headers first:
 412     </p>
 413
 414 <pre>
 415   io_fprintf (io, "This is the show_params shared object script.\r\n\r\n");
 416   io_fprintf (io, "Your browser sent the following headers:\r\n\r\n");
 417
 418   headers = http_request_get_headers (http_request);
 419   for (i = 0; i &lt; vector_size (headers); ++i)
 420     {
 421       vector_get (headers, i, header);
 422       io_fprintf (io, "\t%s: %s\r\n", header.key, header.value);
 423     }
 424
 425   io_fprintf (io, "----- end of headers -----\r\n");
 426 </pre>
 427
 428     <p>
 429       The full URL (including the query string), the path alone,
 430       the query string:
 431     </p>
 432
 433 <pre>
 434   io_fprintf (io, "The URL was: %s\r\n",
 435               http_request_get_url (http_request));
 436   io_fprintf (io, "The path component was: %s\r\n",
 437               http_request_path (http_request));
 438   io_fprintf (io, "The query string was: %s\r\n",
 439               http_request_query_string (http_request));
 440   io_fprintf (io, "The query arguments were:\r\n");
 441 </pre>
 442
 443     <p>
 444       Finally we print out the CGI parameters from the <code>cgi</code>
 445       object:
 446     </p>
 447
 448 <pre>
 449   params = cgi_params (cgi);
 450   for (i = 0; i &lt; vector_size (params); ++i)
 451     {
 452       vector_get (params, i, name);
 453       value = cgi_param (cgi, name);
 454       io_fprintf (io, "\t%s=%s\r\n", name, value);
 455     }
 456
 457   io_fprintf (io, "----- end of parameters -----\r\n");
 458
 459   return close;
 460 }
 461 </pre>
 462
 463     <h2>Further examples</h2>
 464
 465     <p>
 466       That's the end of this tutorial. I hope you enjoyed it. Please
 467       contact the author about corrections or to obtain more information.
 468     </p>
 469
 470     <h2>Links to manual pages</h2>
 471
 472     <ul>
 473       <li> <a href="rws_request_canonical_path.3.html"><code>rws_request_canonical_path(3)</code></a> </li>
 474       <li> <a href="rws_request_file_path.3.html"><code>rws_request_file_path(3)</code></a> </li>
 475       <li> <a href="rws_request_host_header.3.html"><code>rws_request_host_header(3)</code></a> </li>
 476       <li> <a href="rws_request_http_request.3.html"><code>rws_request_http_request(3)</code></a> </li>
 477       <li> <a href="rws_request_io.3.html"><code>rws_request_io(3)</code></a> </li>
 478       <li> <a href="rws_request_pool.3.html"><code>rws_request_pool(3)</code></a> </li>
 479       <li> <a href="rws_request_pth.3.html"><code>rws_request_pth(3)</code></a> </li>
 480     </ul>
 481
 482     <hr>
 483     <address><a href="mailto:rich@annexia.org">Richard Jones</a></address>
 484 <!-- Created: Wed May  1 19:36:16 BST 2002 -->
 485 <!-- hhmts start -->
 486 Last modified: Wed Oct  9 20:02:40 BST 2002
 487 <!-- hhmts end -->
 488   </body>
 489 </html>