DLIFE machine language reference

General overview

I'm going to assume that you're mostly familiar with Tierra, but I'll also give you a short introduction.

In DLIFE/Tierra, life forms consist of small machine language programs called ``cells''. Each cell has its own program counter and hence its own thread of execution, and runs independently of any other cell. Cells live in a virtual machine (VM) called the ``soup''. The soup is just a big piece of byte-addressed memory, 128 Kbytes in size by default.

The soup contains an interpreter which interprets the machine language. For DLIFE cells, the soup is their entire universe. They cannot address or access anything outside their own soup. (Although they can occasionally jump from one soup to another -- but more of that later).

All the cells sharing the soup are run timesliced in a simple round-robin scheme, and hence each cell will get roughly the same amount of actual CPU dedicated to it.

The cell machine language is not 8086 assembler. In fact, it's a special purpose machine language written just to run cells. It has some similarities with more traditional machine languages -- for example, registers, indirect addressing, jumps, a stack, and so on -- but also some strange features that you wouldn't expect to find in a real machine language. The most important feature of the language is termed (in a term coined by Tom Ray) its ``non-brittleness''. This means that the language is robust against small corruptions, changes, and the insertion or deletion of a few bytes here and there. In a real machine language, inserting a few bytes into a piece of code requires that you relink the code so that all jump and call target addresses change. If you didn't relink the code, then the insertion would be fatal -- the code wouldn't function at all. Real machine languages are said to be ``brittle''. In the DLIFE machine language, insertions and deletions are tolerated -- the code will continue to work much as before. The main way in which this works is that absolute addresses have been replaced by explicit labels in the code:

  code
  : :
  code
  label 123
  code
  : :   <---
  code
  jump to label 123
  code
  : :
  code

Suppose in the example above that an accidental insertion or deletion occurs at the point marked by the arrow. The chances are that the code will continue to work, because the ``jump'' command will still be able to find and jump to the correct label.

Another important and unusual feature of the machine language is that it allows cells to reproduce. Through the use of the MALLOC and DIVIDE instructions, a cell may produce a daughter cell, populate it with code, and then divide (set it running as an independent cell).

The soup interpreter is error-prone. Most instructions will run as intended, but occasionally instructions fail in certain ways. For example, an instruction to increment the accumulator might, one in a hundred thousand times, increment the accumulator twice, or even not at all. The soup memory is also error-prone. So-called cosmic rays hit the soup very occasionally, flipping bits at random. By these means, cells may occasionally mutate. Note that because the machine language is very deliberately chosen to be non-brittle, mutations often result in cells which still work, but perhaps work in slightly different (better?) ways. Hence cells evolve.

There is one initial cell, which was written by hand. This cell is called the ``god cell'' (although perhaps Adam or Eve might have been a better biblical parallel). This cell simply copies itself repeatedly.

Cells which execute instructions incorrectly (for example, they do not fill the accumulator register with a suitable value before calling MALLOC) accumulate errors. As the soup becomes more and more full, a special process called the ``grim reaper'' is invoked. The grim reaper kills off the cells with the largest error count until the soup pressure is eased.

The DLIFE model is one of pure natural selection. There are no artificial pressures placed on cells to become, for example, good at executing some particular algorithm. Instead, cells compete simply for space in the soup and time on the virtual CPU.

Cell virtual machine

The cell VM consists of 4 registers, some local storage, a stack and stack pointer, an error counter and possibly an attached daughter cell.

Registers

The 4 visible registers are:

Register Purpose
A The accumulator, for mathematical operations and general purpose use.
B Base address, for addressing local variables.
I Index address.
P Program counter.

All registers are 16 bit signed integers, storing any value between -32768 and 32767 inclusive.

All addresses are stored relative to the start address of the cell. Hence if P is set to zero, then execution jumps to the beginning of the cell's code.

When a cell is created, all registers are initialized to zero.

Local storage

The cell has a fragment of soup memory. Because addresses are relative, to the cell the memory appears to stretch from address 0 to address N-1, where N is the total number of bytes allocated to the cell by her mother's call to MALLOC. It is not possible for the cell to find out the value of N (we assume that the mother allocated enough memory for the cell).

A cell may store code or data at any memory address within itself. There is no distinction made in the VM between code and data.

A cell may not write to memory outside the range 0 to N-1. Doing so will incur an error. Cells may, however, read any memory outside this range.

After calling MALLOC, the I register is initialized with the start address of the daughter cell. Between the MALLOC and the next call to DIVIDE, the cell may additionally write to addresses I through I + A - 1, in order to initialize the daughter.

Stack

The cell has a stack of 16 words. Each stack entry is a 16 bit signed integer.

A stack pointer stores the current position in the stack.

The stack can only be accessed through one of the PUSH or POP instructions. PUSH x (where ``x'' is the name of a register) increments the value of the stack pointer and copies the value in register x onto the stack at the position addressed by the stack pointer. POP x copies the value on the stack addressed by the stack pointer into register x and decrements the value of the stack pointer.

The stack is circular, in that pushing more than 16 numbers onto the stack causes the stack to wrap around and overwrite the top entry. Thus it is not possible to underflow or overflow the stack.

The stack pointer register is not visible to the cell VM. In other words, the cell cannot directly read or write to the stack pointer (except implicitly through use of the PUSH or POP instructions). Because of this, it is not possible for the cell to tell if the stack is implemented as grows-down or grows-up, nor whether the implementation pushes first and changes the stack pointer second or vice versa.

When the cell is created, all stack entries and the stack pointer are initialized to zero.

Error counter

The error counter starts off at zero, and is incremented by one every time the cell makes a mistake when executing an instruction. Typical mistakes are:

Basic instruction set

The basic instruction set consists of just 17 commands encoded in just 38 different numeric values.

Each instruction is encoded in a single byte.

Two instructions, FINDB and FINDF might be considered to be multi-byte instructions. See the description below for details.

There is one instruction prefix, IFZ, but in the discussion below, we will just consider it as if it was a single byte instruction (which, in fact, it is).

The VM makes no distinction between code and data. Any byte which is addressable as part of the mother cell can be considered to be either executable code or a data byte (or both).

Instructions are encoded into soup bytes as follows:

 Most                              Least
 significant                 significant
 bit                                 bit
+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |
| 7  | 6  | 5  | 4  | 3  | 2  | 1  | 0  |
|    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+
 \_______/ \___________________________/
     |                   |
  Ignored       Instruction encoding
             (values 0-4,7-39 correspond to
              instructions, values 40-63
              are unknown and result in
              cell errors)

The lower 6 bits are interpreted as follows:

Dec Hex Bin Mnemonic Meaning
0 00 00 0000 NOP0 No-op. This instruction does nothing when executed but is used as a pattern matching label by FINDB and FINDF.
1 01 00 0001 NOP1 No-op. This instruction does nothing when executed but is used as a pattern matching label by FINDB and FINDF.
2 02 00 0010 INC A A <- A + 1
3 03 00 0011 DEC A A <- A - 1
4 04 00 0100 SHL A A <- A << 1
7 07 00 0111 IFZ IFZ is normally used as an instruction prefix, ie. IFZ <instruction>. If A == 0, execute the next instruction, otherwise skip the next instruction.
8 08 00 1000 FINDB FINDB should be followed by a pattern (a series of one or more NOP0 and NOP1 bytes). The FINDB instruction searches backwards from the current program counter until it finds the inverse pattern of NOP0 and NOP1 bytes. It returns the relative address of the found pattern in register I. If the pattern is not found, it sets I == 0.
9 09 00 1001 FINDF FINDF should be followed by a pattern (a series of one or more NOP0 and NOP1 bytes). The FINDF instruction searches forwards from the current program counter until it finds the inverse pattern of NOP0 and NOP1 bytes. It returns the relative address of the found pattern in register I. If the pattern is not found, it sets I <- 0.
10 0A 00 1010 MALLOC MALLOC allocates a daughter cell of size A bytes. It returns the address of the daughter cell in register I, or if the memory could not be allocated, it sets I <- 0. Only one daughter cell may be allocated at a time. Therefore, calling MALLOC twice without calling DIVIDE in between is an error. Register A must be between 10 and 512.
11 0B 00 1011 DIVIDE DIVIDE splits off the current daughter cell and starts it running as a fully independent cell. The daughter cell must first have been allocated by MALLOC and populated with code, so calling DIVIDE twice without calling MALLOC in between is an error.
12 0C 00 1100 MOVE [I],A Load the unsigned byte at relative address I into register A.
13 0D 00 1101 MOVE A,[I] Store the value of register A in the soup byte at relative address I. If A is less than 0 or greater than 255 then the effects are undefined. If I lies outside the boundaries of the mother or daughter cells, then this is a cell error.
14 0E 00 1110 DMOVE [I],A Load the signed word (two bytes) at relative address I into register A. The VM is big endian (Motorola ordering).
15 0F 00 1111 DMOVE A,[I] Store the value of register A in the soup bytes at relative address I and I+1. If (I, I+1) lies outside the boundaries of the mother or daughter cells, then this is a cell error. The VM is big endian (Motorola ordering).
16-31 10-1F 01 r2r1 XOR <reg1>,<reg2> <reg2> <- <reg1> XOR <reg2> for any registers in A (00), B (01), I (10) or P (11).
32-35 20-23 10 00rr PUSH <reg> Push the register onto the stack for any registers in A (00), B (01), I (10) or P (11).
36-39 24-27 10 01rr POP <reg> Pop the top of the stack into the register for any registers in A (00), B (01), I (10) or P (11).

Note that instruction encodings 5, 6 and 40-63 are not used. Attempting to execute one of these unknown instructions results in a cell error.

Instruction set timings

This table shows the number of (virtual) CPU cycles taken to execute each instruction.

Instruction Cycles Notes
NOP0 1
NOP1 1
INC A 1
DEC A 1
SHL A 1
IFZ 1
FINDB 1 + n ``n'' is the distance between the current program counter and the start of the matched pattern.
FINDF 1 + n ``n'' is the distance between the current program counter and the start of the matched pattern.
MALLOC 1
DIVIDE 1
MOVE [I],A 1
MOVE A,[I] 1
DMOVE [I],A 1
DMOVE A,[I] 1
XOR <reg1>,<reg2> 1
PUSH <reg> 1
POP <reg> 1

Instruction set macros

The basic instruction set is augmented with macros which allow common programming paradigms to be written. Macros are just a short-hand used in the assembler / disassembler, and are not implemented in any way by the VM.

Macro Expands to Meaning Side effects?
Basic register operations ...
MOVE <reg1>,<reg2> PUSH <reg1>
POP <reg2>
Move the contents of register 1 into register 2. None.
SWAP <reg1>,<reg2> XOR <reg1>,<reg2>
XOR <reg2>,<reg1>
XOR <reg1>,<reg2>
XOR <reg2>,<reg1>
This has the effect of swapping the contents of registers 1 and 2. None.
ZERO <reg> XOR <reg>,<reg> A <- 0 None.
ADD <n>, A INC A repeated n times. This is only really suitable for small values of n. A <- A + n None.
MOVE <n>, A ZERO A followed by a series of SHL A and INC A operations required to initialize A to n. (n >= 0) A <- n None.
Access local (word-sized) variables, relative to the base address register B ...
LOAD <n>, A PUSH I
MOVE B,A
ADD n*2,A
MOVE A,I
DMOVE [I],A
POP I
Load variable number n into register A. Destroys contents of I register.
STORE A,<n> PUSH I
PUSH A
MOVE B,A
ADD n*2,A
MOVE A,I
POP A
DMOVE A,[I]
POP I
Store register A in variable number n. Destroys contents of I register.
Jumps and subroutines ...
JMP I PUSH I
POP P
Jump to address I. None.
JMPF <pattern> FINDF <pattern>
PUSH I
POP P
Jump forwards to pattern. Destroys contents of I register.
JMPB <pattern> FINDB <pattern>
PUSH I
POP P
Jump backwards to pattern. Destroys contents of I register.
JMPZF <pattern> FINDF <pattern>
PUSH I
IFZ POP P
POP I
If A == 0, jump forwards to pattern. Destroys contents of I register.
JMPZB <pattern> FINDB <pattern>
PUSH I
IFZ POP P
POP I
If A == 0, jump backwards to pattern. Destroys contents of I register.
CALLF <pattern> PUSH P
FINDF <pattern>
PUSH I
POP P
Call forwards to a subroutine at pattern. The current address is saved on the stack. Destroys contents of I register.
CALLB <pattern> PUSH P
FINDB <pattern>
PUSH I
POP P
Call backwards to a subroutine at pattern. The current address is saved on the stack. Destroys contents of I register.
RET <n> POP A
ADD n+3,A
MOVE A,P
Return from subroutine. n is the length of the pattern in the corresponding CALLF/CALLB instruction. The return address should be the top address on the stack. Destroys contents of A register.
Assembler directives ...
DB <n>   Define n bytes of scratch space. The scratch space is initialized with 0xFF bytes (not with zero bytes, since those clash with pattern symbols). None.

Patterns

The assembler recognizes patterns and writes out a series of NOP0 and NOP1 bytes.

Mnemonic Expands to
<pattern>: Each 0 and 1 in the pattern is translated into a NOP0 and NOP1 on output.
~<pattern>: Each 0 and 1 in the pattern is translated into a NOP1 and NOP0 on output (note the inversion).
instruction <pattern>: Each 0 and 1 in the pattern is translated into a NOP0 and NOP1 on output.
instruction ~<pattern>: Each 0 and 1 in the pattern is translated into a NOP1 and NOP0 on output (note the inversion).

Since the FINDB and FINDF patterns search for the inverse of a pattern, a typical instruction sequence will look like this:

0011:
  ; : :
  ; : :

  FINDB ~0011
  ; I contains address of 0011 pattern.

Richard Jones
Last modified: Sat Oct 7 13:49:42 BST 2000