+ SYSTEM CALLS AND FILES ----------------------------------------------------------------------
+
+ Miscellaneous words related to system calls, and standard access to files.
+)
+
+( BYE exits by calling the Linux exit(2) syscall. )
+: BYE ( -- )
+ 0 ( return code (0) )
+ SYS_EXIT ( system call number )
+ SYSCALL1
+;
+
+(
+ UNUSED returns the number of cells remaining in the user memory (data segment).
+
+ For our implementation we will use Linux brk(2) system call to find out the end
+ of the data segment and subtract HERE from it.
+)
+: GET-BRK ( -- brkpoint )
+ 0 SYS_BRK SYSCALL1 ( call brk(0) )
+;
+
+: UNUSED ( -- n )
+ GET-BRK ( get end of data segment according to the kernel )
+ HERE @ ( get current position in data segment )
+ -
+ 4 / ( returns number of cells )
+;
+
+(
+ MORECORE increases the data segment by the specified number of (4 byte) cells.
+
+ NB. The number of cells requested should normally be a multiple of 1024. The
+ reason is that Linux can't extend the data segment by less than a single page
+ (4096 bytes or 1024 cells).
+
+ This FORTH doesn't automatically increase the size of the data segment "on demand"
+ (ie. when , (COMMA), ALLOT, CREATE, and so on are used). Instead the programmer
+ needs to be aware of how much space a large allocation will take, check UNUSED, and
+ call MORECORE if necessary. A simple programming exercise is to change the
+ implementation of the data segment so that MORECORE is called automatically if
+ the program needs more memory.
+)
+: BRK ( brkpoint -- )
+ SYS_BRK SYSCALL1
+;
+
+: MORECORE ( cells -- )
+ CELLS GET-BRK + BRK
+;
+
+(
+ Standard FORTH provides some simple file access primitives which we model on
+ top of Linux syscalls.
+
+ The main complication is converting FORTH strings (address & length) into C
+ strings for the Linux kernel.
+
+ Notice there is no buffering in this implementation.
+)
+
+: R/O ( -- fam ) O_RDONLY ;
+: R/W ( -- fam ) O_RDWR ;
+
+: OPEN-FILE ( addr u fam -- fd 0 (if successful) | c-addr u fam -- fd errno (if there was an error) )
+ -ROT ( fam addr u )
+ CSTRING ( fam cstring )
+ SYS_OPEN SYSCALL2 ( open (filename, flags) )
+ DUP ( fd fd )
+ DUP 0< IF ( errno? )
+ NEGATE ( fd errno )
+ ELSE
+ DROP 0 ( fd 0 )
+ THEN
+;
+
+: CREATE-FILE ( addr u fam -- fd 0 (if successful) | c-addr u fam -- fd errno (if there was an error) )
+ O_CREAT OR
+ O_TRUNC OR
+ -ROT ( fam addr u )
+ CSTRING ( fam cstring )
+ 420 -ROT ( 0644 fam cstring )
+ SYS_OPEN SYSCALL3 ( open (filename, flags|O_TRUNC|O_CREAT, 0644) )
+ DUP ( fd fd )
+ DUP 0< IF ( errno? )
+ NEGATE ( fd errno )
+ ELSE
+ DROP 0 ( fd 0 )
+ THEN
+;
+
+: CLOSE-FILE ( fd -- 0 (if successful) | fd -- errno (if there was an error) )
+ SYS_CLOSE SYSCALL1
+ NEGATE
+;
+
+: READ-FILE ( addr u fd -- u2 0 (if successful) | addr u fd -- 0 0 (if EOF) | addr u fd -- u2 errno (if error) )
+ >R SWAP R> ( u addr fd )
+ SYS_READ SYSCALL3
+
+ DUP ( u2 u2 )
+ DUP 0< IF ( errno? )
+ NEGATE ( u2 errno )
+ ELSE
+ DROP 0 ( u2 0 )
+ THEN
+;
+
+(
+ PERROR prints a message for an errno, similar to C's perror(3) but we don't have the extensive
+ list of strerror strings available, so all we can do is print the errno.
+)
+: PERROR ( errno addr u -- )
+ TELL
+ ':' EMIT SPACE
+ ." ERRNO="
+ . CR
+;
+
+(
+ ASSEMBLER CODE ----------------------------------------------------------------------
+
+ This is just the outline of a simple assembler, allowing you to write FORTH primitives
+ in assembly language.
+
+ Assembly primitives begin ': NAME' in the normal way, but are ended with ;CODE. ;CODE
+ updates the header so that the codeword isn't DOCOL, but points instead to the assembled
+ code (in the DFA part of the word).
+
+ We provide a convenience macro NEXT (you guessed what it does). However you don't need to
+ use it because ;CODE will put a NEXT at the end of your word.
+
+ The rest consists of some immediate words which expand into machine code appended to the
+ definition of the word. Only a very tiny part of the i386 assembly space is covered, just
+ enough to write a few assembler primitives below.
+)
+
+HEX
+
+( Equivalent to the NEXT macro )
+: NEXT IMMEDIATE AD C, FF C, 20 C, ;
+
+: ;CODE IMMEDIATE
+ [COMPILE] NEXT ( end the word with NEXT macro )
+ ALIGN ( machine code is assembled in bytes so isn't necessarily aligned at the end )
+ LATEST @ DUP
+ HIDDEN ( unhide the word )
+ DUP >DFA SWAP >CFA ! ( change the codeword to point to the data area )
+ [COMPILE] [ ( go back to immediate mode )
+;
+
+( The i386 registers )
+: EAX IMMEDIATE 0 ;
+: ECX IMMEDIATE 1 ;
+: EDX IMMEDIATE 2 ;
+: EBX IMMEDIATE 3 ;
+: ESP IMMEDIATE 4 ;
+: EBP IMMEDIATE 5 ;
+: ESI IMMEDIATE 6 ;
+: EDI IMMEDIATE 7 ;
+
+( i386 stack instructions )
+: PUSH IMMEDIATE 50 + C, ;
+: POP IMMEDIATE 58 + C, ;
+
+( RDTSC instruction )
+: RDTSC IMMEDIATE 0F C, 31 C, ;
+
+DECIMAL
+
+(
+ RDTSC is an assembler primitive which reads the Pentium timestamp counter (a very fine-
+ grained counter which counts processor clock cycles). Because the TSC is 64 bits wide
+ we have to push it onto the stack in two slots.
+)
+: RDTSC ( -- lsb msb )
+ RDTSC ( writes the result in %edx:%eax )
+ EAX PUSH ( push lsb )
+ EDX PUSH ( push msb )
+;CODE
+
+(
+ INLINE can be used to inline an assembler primitive into the current (assembler)
+ word.
+
+ For example:
+
+ : 2DROP INLINE DROP INLINE DROP ;CODE
+
+ will build an efficient assembler word 2DROP which contains the inline assembly code
+ for DROP followed by DROP (eg. two 'pop %eax' instructions in this case).
+
+ Another example. Consider this ordinary FORTH definition:
+
+ : C@++ ( addr -- addr+1 byte ) DUP 1+ SWAP C@ ;
+
+ (it is equivalent to the C operation '*p++' where p is a pointer to char). If we
+ notice that all of the words used to define C@++ are in fact assembler primitives,
+ then we can write a faster (but equivalent) definition like this: