1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
4 <title>DLIFE machine language reference</title>
7 <body bgcolor="#ffffff">
8 <h1>DLIFE machine language reference</h1>
10 <h2>General overview</h2>
13 I'm going to assume that you're mostly familiar
14 with <a href="http://www.hip.atr.co.jp/~ray/tierra/">Tierra</a>,
15 but I'll also give you a short introduction.
19 In DLIFE/Tierra, life forms consist of small
20 machine language programs called ``cells''.
21 Each cell has its own program counter and hence
22 its own thread of execution, and runs independently
23 of any other cell. Cells live in a virtual machine
24 (VM) called the ``soup''. The soup is just a big
25 piece of byte-addressed memory, 128 Kbytes in size
30 The soup contains an interpreter which interprets
31 the machine language. For DLIFE cells, the soup
32 is their entire universe. They cannot address or
33 access anything outside their own soup. (Although
34 they can occasionally jump from one soup to another
35 -- but more of that later).
39 All the cells sharing the soup are run timesliced
40 in a simple round-robin scheme, and hence each
41 cell will get roughly the same amount of actual CPU
46 The cell machine language is not 8086 assembler. In
47 fact, it's a special purpose machine language written
48 just to run cells. It has some similarities with
49 more traditional machine languages -- for example,
50 registers, indirect addressing, jumps, a stack,
51 and so on -- but also some strange features that
52 you wouldn't expect to find in a real machine
53 language. The most important feature of the
54 language is termed (in a term coined by Tom
55 Ray) its ``non-brittleness''. This means that
56 the language is robust against small corruptions,
57 changes, and the insertion or deletion of a few
58 bytes here and there. In a real machine language,
59 inserting a few bytes into a piece of code requires
60 that you relink the code so that all jump and call
61 target addresses change. If you didn't relink the code,
62 then the insertion would be fatal -- the code wouldn't
63 function at all. Real machine languages are said to
64 be ``brittle''. In the DLIFE machine language, insertions
65 and deletions are tolerated -- the code will continue to
66 work much as before. The main way in which this works
67 is that absolute addresses have been replaced by
68 explicit labels in the code:
86 Suppose in the example above that an accidental
87 insertion or deletion occurs at the point marked by the arrow.
88 The chances are that the code will continue to work,
89 because the ``jump'' command will still be able to
90 find and jump to the correct label.
94 Another important and unusual feature of the machine
95 language is that it allows cells to reproduce. Through
96 the use of the <tt>MALLOC</tt> and <tt>DIVIDE</tt>
97 instructions, a cell may produce a daughter cell,
98 populate it with code, and then divide (set it running
99 as an independent cell).
103 The soup interpreter is error-prone. Most instructions
104 will run as intended, but occasionally instructions
105 fail in certain ways. For example, an instruction to
106 increment the accumulator might, one in a hundred thousand times,
107 increment the accumulator <i>twice</i>, or even not at
108 all. The soup memory is also error-prone. So-called
109 cosmic rays hit the soup very occasionally, flipping
110 bits at random. By these means, cells may occasionally
111 mutate. Note that because the machine language is
112 very deliberately chosen to be non-brittle, mutations
113 often result in cells which still work, but perhaps
114 work in slightly different (better?) ways. Hence
119 There is one initial cell, which was written by hand. This
120 cell is called the ``god cell'' (although perhaps Adam
121 or Eve might have been a better biblical parallel). This
122 cell simply copies itself repeatedly.
126 Cells which execute instructions incorrectly (for
127 example, they do not fill the accumulator register
128 with a suitable value before calling <tt>MALLOC</tt>)
129 accumulate errors. As the soup becomes more and more
130 full, a special process called the ``grim reaper''
131 is invoked. The grim reaper kills off the cells
132 with the largest error count until the soup pressure
137 The DLIFE model is one of pure natural selection. There
138 are no artificial pressures placed on cells to become,
139 for example, good at executing some particular algorithm.
140 Instead, cells compete simply for space in the soup
141 and time on the virtual CPU.
144 <h2>Cell virtual machine</h2>
147 The cell VM consists of 4 registers, some local
148 storage, a stack and stack pointer, an error counter
149 and possibly an attached daughter cell.
155 The 4 visible registers are:
160 <th> Register </th> <th> Purpose </th>
163 <td> A </td> <td> The accumulator, for mathematical
164 operations and general purpose use. </td>
167 <td> B </td> <td> Base address, for addressing local variables. </td>
170 <td> I </td> <td> Index address. </td>
173 <td> P </td> <td> Program counter. </td>
178 All registers are 16 bit signed integers, storing
179 any value between -32768 and 32767 inclusive.
183 All addresses are stored relative to the start
184 address of the cell. Hence if P is set to zero,
185 then execution jumps to the beginning of the cell's
190 When a cell is created, all registers are initialized
194 <h3>Local storage</h3>
197 The cell has a fragment of soup memory. Because addresses
198 are relative, to the cell the memory appears to stretch from
199 address 0 to address N-1, where N is the total number
200 of bytes allocated to the cell by her mother's call
201 to <tt>MALLOC</tt>. It is not possible for
202 the cell to find out the value of N (we assume that
203 the mother allocated enough memory for the cell).
207 A cell may store code or data at any memory address
208 within itself. There is no distinction made in the
209 VM between code and data.
213 A cell may not write to memory outside the range 0 to N-1.
214 Doing so will incur an error. Cells may, however, read
215 any memory outside this range.
219 After calling <tt>MALLOC</tt>, the I register is
220 initialized with the start address of the daughter
221 cell. Between the <tt>MALLOC</tt> and the next call
222 to <tt>DIVIDE</tt>, the cell may additionally write
223 to addresses I through I + A - 1, in order to initialize
230 The cell has a stack of 16 words. Each stack entry is
231 a 16 bit signed integer.
235 A stack pointer stores the current position in the
240 The stack can only be accessed through one of the <tt>PUSH</tt>
241 or <tt>POP</tt> instructions. <tt>PUSH x</tt> (where ``x'' is
242 the name of a register) increments the value of the
243 stack pointer and copies the value in register x onto
244 the stack at the position addressed by the stack pointer.
245 <tt>POP x</tt> copies the value on the stack addressed
246 by the stack pointer into register x and decrements the
247 value of the stack pointer.
251 The stack is circular, in that pushing more than 16 numbers
252 onto the stack causes the stack to wrap around and overwrite
253 the top entry. Thus it is not possible to underflow or
258 The stack pointer register is not visible to the cell
259 VM. In other words, the cell cannot directly read or write
260 to the stack pointer (except implicitly through use of
261 the <tt>PUSH</tt> or <tt>POP</tt> instructions). Because
262 of this, it is not possible for the cell to tell if the
263 stack is implemented as grows-down or grows-up, nor
264 whether the implementation pushes first and changes the
265 stack pointer second or vice versa.
269 When the cell is created, all stack entries and the stack
270 pointer are initialized to zero.
273 <h3>Error counter</h3>
276 The error counter starts off at zero, and is incremented
277 by one every time the cell makes a mistake when
278 executing an instruction. Typical mistakes are:
282 <li> Incorrectly initializing the A register before
283 calling <tt>MALLOC</tt>.
284 <li> Not calling <tt>MALLOC</tt> before calling <tt>DIVIDE</tt>.
285 <li> Not calling <tt>DIVIDE</tt> before calling <tt>MALLOC</tt>.
286 <li> Executing <tt>FINDF</tt> or <tt>FINDB</tt> and not
287 finding a matching label.
290 <h2>Basic instruction set</h2>
293 The basic instruction set consists of just 17 commands
294 encoded in just 38 different numeric values.
298 Each instruction is encoded in a single byte.
302 Two instructions, <tt>FINDB</tt> and <tt>FINDF</tt>
303 might be considered to be multi-byte instructions.
304 See the description below for details.
308 There is one instruction prefix, <tt>IFZ</tt>, but
309 in the discussion below, we will just consider it
310 as if it was a single byte instruction (which, in
315 The VM makes no distinction between code and data.
316 Any byte which is addressable as part of the mother
317 cell can be considered to be either executable
318 code or a data byte (or both).
322 Instructions are encoded into soup bytes as follows:
327 significant significant
329 +----+----+----+----+----+----+----+----+
331 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
333 +----+----+----+----+----+----+----+----+
334 \_______/ \___________________________/
336 Ignored Instruction encoding
337 (values 0-4,7-39 correspond to
338 instructions, values 40-63
339 are unknown and result in
344 The lower 6 bits are interpreted as follows:
356 <td> 0 </td> <td> 00 </td> <td> 00 0000 </td>
357 <td> <tt>NOP0</tt> </td>
358 <td> No-op. This instruction does nothing when executed
359 but is used as a pattern matching label by <tt>FINDB</tt>
360 and <tt>FINDF</tt>. </td>
363 <td> 1 </td> <td> 01 </td> <td> 00 0001 </td>
364 <td> <tt>NOP1</tt> </td>
365 <td> No-op. This instruction does nothing when executed
366 but is used as a pattern matching label by <tt>FINDB</tt>
367 and <tt>FINDF</tt>. </td>
370 <td> 2 </td> <td> 02 </td> <td> 00 0010 </td>
371 <td> <tt>INC A</tt> </td>
372 <td> A <- A + 1 </td>
375 <td> 3 </td> <td> 03 </td> <td> 00 0011 </td>
376 <td> <tt>DEC A</tt> </td>
377 <td> A <- A - 1 </td>
380 <td> 4 </td> <td> 04 </td> <td> 00 0100 </td>
381 <td> <tt>SHL A</tt> </td>
382 <td> A <- A << 1 </td>
385 <td> 7 </td> <td> 07 </td> <td> 00 0111 </td>
386 <td> <tt>IFZ</tt> </td>
387 <td> <tt>IFZ</tt> is normally used as an instruction
388 prefix, ie. <tt>IFZ</tt> <instruction>.
389 If A == 0, execute the next instruction,
390 otherwise skip the next instruction. </td>
393 <td> 8 </td> <td> 08 </td> <td> 00 1000 </td>
394 <td> <tt>FINDB</tt> </td>
395 <td> <tt>FINDB</tt> should be followed by a pattern
396 (a series of one or more NOP0 and NOP1 bytes). The
397 <tt>FINDB</tt> instruction searches backwards
398 from the current program counter until it finds
399 the <tt>inverse</tt> pattern of NOP0 and NOP1 bytes.
400 It returns the relative address of the found pattern
401 in register I. If the pattern is not found, it
406 <td> 9 </td> <td> 09 </td> <td> 00 1001 </td>
407 <td> <tt>FINDF</tt> </td>
408 <td> <tt>FINDF</tt> should be followed by a pattern
409 (a series of one or more NOP0 and NOP1 bytes). The
410 <tt>FINDF</tt> instruction searches forwards
411 from the current program counter until it finds
412 the <tt>inverse</tt> pattern of NOP0 and NOP1 bytes.
413 It returns the relative address of the found pattern
414 in register I. If the pattern is not found, it
419 <td> 10 </td> <td> 0A </td> <td> 00 1010 </td>
420 <td> <tt>MALLOC</tt> </td>
421 <td> <tt>MALLOC</tt> allocates a daughter
422 cell of size A bytes. It returns the address
423 of the daughter cell in register I, or if the
424 memory could not be allocated, it sets I <- 0.
425 Only one daughter cell may be allocated at a
426 time. Therefore, calling <tt>MALLOC</tt> twice
427 without calling <tt>DIVIDE</tt> in between is an
428 error. Register A must be between 10 and 512.
432 <td> 11 </td> <td> 0B </td> <td> 00 1011 </td>
433 <td> <tt>DIVIDE</tt> </td>
434 <td> <tt>DIVIDE</tt> splits off the current daughter
435 cell and starts it running as a fully independent
436 cell. The daughter cell must first have been allocated
437 by <tt>MALLOC</tt> and populated with code, so
438 calling <tt>DIVIDE</tt> twice without calling
439 <tt>MALLOC</tt> in between is an error.
443 <td> 12 </td> <td> 0C </td> <td> 00 1100 </td>
444 <td> <tt>MOVE [I],A</tt> </td>
445 <td> Load the unsigned byte at relative address I into
450 <td> 13 </td> <td> 0D </td> <td> 00 1101 </td>
451 <td> <tt>MOVE A,[I]</tt> </td>
452 <td> Store the value of register A in the soup byte
453 at relative address I. If A is less than 0 or
454 greater than 255 then the effects are undefined.
455 If I lies outside the boundaries of the mother
456 or daughter cells, then this is a cell error.
460 <td> 14 </td> <td> 0E </td> <td> 00 1110 </td>
461 <td> <tt>DMOVE [I],A</tt> </td>
462 <td> Load the signed word (two bytes) at relative address I
463 into register A. The VM is big endian (Motorola ordering).
467 <td> 15 </td> <td> 0F </td> <td> 00 1111 </td>
468 <td> <tt>DMOVE A,[I]</tt> </td>
469 <td> Store the value of register A in the soup
470 bytes at relative address I and I+1. If (I, I+1)
471 lies outside the boundaries of the mother or
472 daughter cells, then this is a cell error.
473 The VM is big endian (Motorola ordering).
477 <td> 16-31 </td> <td> 10-1F </td> <td> 01 r2r1 </td>
478 <td> <tt>XOR <reg1>,<reg2> </tt> </td>
479 <td> <reg2> <- <reg1> XOR <reg2>
480 for any registers in A (00), B (01), I (10) or P (11).
484 <td> 32-35 </td> <td> 20-23 </td> <td> 10 00rr </td>
485 <td> <tt>PUSH <reg> </tt> </td>
486 <td> Push the register onto the stack for any
487 registers in A (00), B (01), I (10) or P (11).
491 <td> 36-39 </td> <td> 24-27 </td> <td> 10 01rr </td>
492 <td> <tt>POP <reg> </tt> </td>
493 <td> Pop the top of the stack into the register for
494 any registers in A (00), B (01), I (10) or P (11).
500 Note that instruction encodings 5, 6 and 40-63 are
501 not used. Attempting to execute one of these unknown
502 instructions results in a cell error.
505 <h2>Instruction set timings</h2>
508 This table shows the number of (virtual) CPU cycles
509 taken to execute each instruction.
514 <th> Instruction </th>
518 <tr> <td> <tt>NOP0</tt> </td> <td> 1 </td> </tr>
519 <tr> <td> <tt>NOP1</tt> </td> <td> 1 </td> </tr>
520 <tr> <td> <tt>INC A</tt> </td> <td> 1 </td> </tr>
521 <tr> <td> <tt>DEC A</tt> </td> <td> 1 </td> </tr>
522 <tr> <td> <tt>SHL A</tt> </td> <td> 1 </td> </tr>
523 <tr> <td> <tt>IFZ</tt> </td> <td> 1 </td> </tr>
524 <tr> <td> <tt>FINDB</tt> </td> <td> 1 + n </td>
525 <td> ``n'' is the distance between the current
526 program counter and the start of the matched
529 <tr> <td> <tt>FINDF</tt> </td> <td> 1 + n</td>
530 <td> ``n'' is the distance between the current
531 program counter and the start of the matched
534 <tr> <td> <tt>MALLOC</tt> </td> <td> 1 </td> </tr>
535 <tr> <td> <tt>DIVIDE</tt> </td> <td> 1 </td> </tr>
536 <tr> <td> <tt>MOVE [I],A</tt> </td> <td> 1 </td> </tr>
537 <tr> <td> <tt>MOVE A,[I]</tt> </td> <td> 1 </td> </tr>
538 <tr> <td> <tt>DMOVE [I],A</tt> </td> <td> 1 </td> </tr>
539 <tr> <td> <tt>DMOVE A,[I]</tt> </td> <td> 1 </td> </tr>
540 <tr> <td> <tt>XOR <reg1>,<reg2></tt> </td> <td> 1 </td> </tr>
541 <tr> <td> <tt>PUSH <reg></tt> </td> <td> 1 </td> </tr>
542 <tr> <td> <tt>POP <reg></tt> </td> <td> 1 </td> </tr>
545 <h2>Instruction set macros</h2>
548 The basic instruction set is augmented with macros
549 which allow common programming paradigms to be
550 written. Macros are just a short-hand used in
551 the assembler / disassembler, and are not implemented
552 in any way by the VM.
558 <th> Expands to </th>
560 <th> Side effects? </th>
563 <th colspan=4> Basic register operations ... </th>
566 <td> <tt>MOVE <reg1>,<reg2></tt> </td>
567 <td> <tt>PUSH <reg1><br>POP <reg2></tt> </td>
568 <td> Move the contents of register 1 into register 2. </td>
572 <td> <tt>SWAP <reg1>,<reg2></tt> </td>
573 <td> <tt>XOR <reg1>,<reg2><br>
574 XOR <reg2>,<reg1><br>
575 XOR <reg1>,<reg2><br>
576 XOR <reg2>,<reg1></tt> </td>
577 <td> This has the effect of swapping the contents of
578 registers 1 and 2. </td>
582 <td> <tt>ZERO <reg></tt> </td>
583 <td> <tt>XOR <reg>,<reg></tt> </td>
588 <td> <tt>ADD <n>, A</tt> </td>
589 <td> <tt>INC A</tt> repeated n times. This is only
590 really suitable for small values of n. </td>
591 <td> A <- A + n </td>
595 <td> <tt>MOVE <n>, A</tt> </td>
596 <td> <tt>ZERO A</tt> followed by a series of
597 <tt>SHL A</tt> and <tt>INC A</tt> operations
598 required to initialize A to n. (n >= 0) </td>
603 <th colspan=4> Access local (word-sized) variables, relative
604 to the base address register B ... </th>
607 <td> <tt>LOAD <n>, A</tt> </td>
608 <td> <tt>PUSH I<br>MOVE B,A<br>ADD n*2,A<br>MOVE A,I<br>
609 DMOVE [I],A<br>POP I</tt> </td>
610 <td> Load variable number n into register A. </td>
611 <td> Destroys contents of I register. </td>
614 <td> <tt>STORE A,<n></tt> </td>
615 <td> <tt>PUSH I<br>PUSH A<br>MOVE B,A<br>ADD n*2,A<br>MOVE A,I<br>
616 POP A<br>DMOVE A,[I]<br>POP I</tt> </td>
617 <td> Store register A in variable number n. </td>
618 <td> Destroys contents of I register. </td>
621 <th colspan=4> Jumps and subroutines ... </th>
624 <td> <tt>JMP I</tt> </td>
625 <td> <tt>PUSH I<br>POP P</tt> </td>
626 <td> Jump to address I. </td>
630 <td> <tt>JMPF <pattern></tt> </td>
631 <td> <tt>FINDF <pattern><br>PUSH I<br>POP P</tt> </td>
632 <td> Jump forwards to pattern. </td>
633 <td> Destroys contents of I register. </td>
636 <td> <tt>JMPB <pattern></tt> </td>
637 <td> <tt>FINDB <pattern><br>PUSH I<br>POP P</tt> </td>
638 <td> Jump backwards to pattern. </td>
639 <td> Destroys contents of I register. </td>
642 <td> <tt>JMPZF <pattern></tt> </td>
643 <td> <tt>FINDF <pattern><br>PUSH I<br>IFZ POP P<br>
645 <td> If A == 0, jump forwards to pattern. </td>
646 <td> Destroys contents of I register. </td>
649 <td> <tt>JMPZB <pattern></tt> </td>
650 <td> <tt>FINDB <pattern><br>PUSH I<br>IFZ POP P<br>
652 <td> If A == 0, jump backwards to pattern. </td>
653 <td> Destroys contents of I register. </td>
656 <td> <tt>CALLF <pattern></tt> </td>
657 <td> <tt>PUSH P<br>FINDF <pattern><br>PUSH I<br>POP P</tt> </td>
658 <td> Call forwards to a subroutine at pattern. The current
659 address is saved on the stack. </td>
660 <td> Destroys contents of I register. </td>
663 <td> <tt>CALLB <pattern></tt> </td>
664 <td> <tt>PUSH P<br>FINDB <pattern><br>PUSH I<br>POP P</tt> </td>
665 <td> Call backwards to a subroutine at pattern. The current
666 address is saved on the stack. </td>
667 <td> Destroys contents of I register. </td>
670 <td> <tt>RET <n></tt> </td>
671 <td> <tt>POP A<br>ADD n+3,A<br>MOVE A,P</tt> </td>
672 <td> Return from subroutine. n is the length of the
673 pattern in the corresponding <tt>CALLF</tt>/<tt>CALLB</tt>
674 instruction. The return address should be the top
675 address on the stack. </td>
676 <td> Destroys contents of A register. </td>
679 <th colspan=4> Assembler directives ... </th>
682 <td> <tt>DB <n></tt> </td>
684 <td> Define n bytes of scratch space. The scratch
685 space is initialized with 0xFF bytes (<strong>not</strong>
686 with zero bytes, since those clash with pattern symbols). </td>
694 The assembler recognizes patterns and writes out
695 a series of <tt>NOP0</tt> and <tt>NOP1</tt> bytes.
700 <th> Mnemonic </th> <th> Expands to </th>
703 <td> <tt><pattern>:</tt> </td>
704 <td> Each 0 and 1 in the pattern is translated into
705 a NOP0 and NOP1 on output. </td>
708 <td> <tt>~<pattern>:</tt> </td>
709 <td> Each 0 and 1 in the pattern is translated into
710 a NOP1 and NOP0 on output (note the inversion). </td>
713 <td> instruction <tt><pattern>:</tt> </td>
714 <td> Each 0 and 1 in the pattern is translated into
715 a NOP0 and NOP1 on output. </td>
718 <td> instruction <tt>~<pattern>:</tt> </td>
719 <td> Each 0 and 1 in the pattern is translated into
720 a NOP1 and NOP0 on output (note the inversion). </td>
725 Since the <tt>FINDB</tt> and <tt>FINDF</tt> patterns
726 search for the <i>inverse</i> of a pattern, a typical
727 instruction sequence will look like this:
736 ; I contains address of 0011 pattern.
740 <address><a href="mailto:rich@annexia.org">Richard Jones</a></address>
741 <!-- Created: Sat Oct 7 10:09:40 BST 2000 -->
743 Last modified: Sat Oct 7 13:49:42 BST 2000