X-Git-Url: http://git.annexia.org/?a=blobdiff_plain;ds=sidebyside;f=jonesforth.f;h=b2cf98ac55483c03709a21bb11f16ae231f13745;hb=e8aaec0eb1676422d1cda0338405c0cae8bd42a7;hp=cb13593a62dbe402b57ad1b6d74ec72b23876985;hpb=371cf1731235a8c596858fd2859a6fe537eed424;p=jonesforth.git diff --git a/jonesforth.f b/jonesforth.f index cb13593..b2cf98a 100644 --- a/jonesforth.f +++ b/jonesforth.f @@ -2,7 +2,7 @@ \ A sometimes minimal FORTH compiler and tutorial for Linux / i386 systems. -*- asm -*- \ By Richard W.M. Jones http://annexia.org/forth \ This is PUBLIC DOMAIN (see public domain release statement below). -\ $Id: jonesforth.f,v 1.4 2007-09-25 21:48:20 rich Exp $ +\ $Id: jonesforth.f,v 1.8 2007-09-28 19:39:21 rich Exp $ \ \ The first part of this tutorial is in jonesforth.S. Get if from http://annexia.org/forth \ @@ -100,12 +100,14 @@ ; \ A few more character constants defined the same way as above. +: ';' [ CHAR ; ] LITERAL ; : '(' [ CHAR ( ] LITERAL ; : ')' [ CHAR ) ] LITERAL ; : '"' [ CHAR " ] LITERAL ; : 'A' [ CHAR A ] LITERAL ; : '0' [ CHAR 0 ] LITERAL ; : '-' [ CHAR - ] LITERAL ; +: '.' [ CHAR . ] LITERAL ; \ While compiling, '[COMPILE] word' compiles 'word' if it would otherwise be IMMEDIATE. : [COMPILE] IMMEDIATE @@ -240,6 +242,27 @@ ( -- ) means the word has no effect on the stack ) +( Some more complicated stack examples, showing the stack notation. ) +: NIP ( x y -- y ) SWAP DROP ; +: TUCK ( x y -- y x y ) DUP ROT ; +: PICK ( x_u ... x_1 x_0 u -- x_u ... x_1 x_0 x_u ) + 1+ ( add one because of 'u' on the stack ) + 4 * ( multiply by the word size ) + DSP@ + ( add to the stack pointer ) + @ ( and fetch ) +; + +( With the looping constructs, we can now write SPACES, which writes n spaces to stdout. ) +: SPACES ( n -- ) + BEGIN + DUP 0> ( while n > 0 ) + WHILE + SPACE ( print a space ) + 1- ( until we count down to 0 ) + REPEAT + DROP +; + ( Standard words for manipulating BASE. ) : DECIMAL ( -- ) 10 BASE ! ; : HEX ( -- ) 16 BASE ! ; @@ -266,9 +289,13 @@ Another wrinkle of . and friends is that they obey the current base in the variable BASE. BASE can be anything in the range 2 to 36. + + While we're defining . &c we can also define .S which is a useful debugging tool. This + word prints the current stack (non-destructively) from top to bottom. ) -: U.R ( u width -- ) - ( DROP XXX ) + +( This is the underlying recursive definition of U. ) +: U. ( u -- ) BASE @ /MOD ( width rem quot ) DUP 0<> IF ( if quotient <> 0 then ) RECURSE ( print the quotient ) @@ -287,39 +314,88 @@ EMIT ; -( U. is easy to define in terms of U.R Note the trailing space. ) -: U. 0 U.R SPACE ; +( + FORTH word .S prints the contents of the stack. It doesn't alter the stack. + Very useful for debugging. +) +: .S ( -- ) + DSP@ ( get current stack pointer ) + BEGIN + DUP S0 @ < + WHILE + DUP @ U. ( print the stack element ) + SPACE + 4+ ( move up ) + REPEAT + DROP +; -( .R is easy, we just need to print the sign and then call U.R ) +( This word returns the width (in characters) of an unsigned number in the current base ) +: UWIDTH ( u -- width ) + BASE @ / ( rem quot ) + DUP 0<> IF ( if quotient <> 0 then ) + RECURSE 1+ ( return 1+recursive call ) + ELSE + DROP ( drop the zero quotient ) + 1 ( return 1 ) + THEN +; + +: U.R ( u width -- ) + SWAP ( width u ) + DUP ( width u u ) + UWIDTH ( width u uwidth ) + -ROT ( u uwidth width ) + SWAP - ( u width-uwidth ) + ( At this point if the requested width is narrower, we'll have a negative number on the stack. + Otherwise the number on the stack is the number of spaces to print. But SPACES won't print + a negative number of spaces anyway, so it's now safe to call SPACES ... ) + SPACES + ( ... and then call the underlying implementation of U. ) + U. +; + +( + .R prints a signed number, padded to a certain width. We can't just print the sign + and call U.R because we want the sign to be next to the number ('-123' instead of '- 123'). +) : .R ( n width -- ) SWAP ( width n ) DUP 0< IF - '-' EMIT ( print the sign ) - NEGATE ( negate the number so we can use U.R ) - SWAP 1- ( n width-1 ) + NEGATE ( width u ) + 1 ( save a flag to remember that it was negative | width n 1 ) + ROT ( 1 width u ) + SWAP ( 1 u width ) + 1- ( 1 u width-1 ) ELSE - SWAP ( n width ) + 0 ( width u 0 ) + ROT ( 0 width u ) + SWAP ( 0 u width ) THEN - DROP ( XXX ) - U.R + SWAP ( flag width u ) + DUP ( flag width u u ) + UWIDTH ( flag width u uwidth ) + -ROT ( flag u uwidth width ) + SWAP - ( flag u width-uwidth ) + + SPACES ( flag u ) + SWAP ( u flag ) + + IF ( was it negative? print the - character ) + '-' EMIT + THEN + + U. ; ( Finally we can define word . in terms of .R, with a trailing space. ) : . 0 .R SPACE ; -( ? fetches the integer at an address and prints it. ) -: ? @ . ; +( The real U., note the trailing space. ) +: U. U. SPACE ; -( With the looping constructs, we can now write SPACES, which writes n spaces to stdout. ) -: SPACES ( n -- ) - BEGIN - DUP 0> ( while n > 0 ) - WHILE - SPACE ( print a space ) - 1- ( until we count down to 0 ) - REPEAT - DROP -; +( ? fetches the integer at an address and prints it. ) +: ? ( addr -- ) @ . ; ( c a b WITHIN returns true if a <= c and c < b ) : WITHIN @@ -337,18 +413,6 @@ THEN ; -( .S prints the contents of the stack. Very useful for debugging. ) -: .S ( -- ) - DSP@ ( get current stack pointer ) - BEGIN - DUP S0 @ < - WHILE - DUP @ . ( print the stack element ) - 4+ ( move up ) - REPEAT - DROP -; - ( DEPTH returns the depth of the stack. ) : DEPTH ( -- n ) S0 @ DSP@ - @@ -369,7 +433,7 @@ ( S" string" is used in FORTH to define strings. It leaves the address of the string and - its length on the stack, with the address at the top. The space following S" is the normal + its length on the stack, (length at the top of stack). The space following S" is the normal space between FORTH words and is not a part of the string. This is tricky to define because it has to do different things depending on whether @@ -385,7 +449,7 @@ case we put the string at HERE (but we _don't_ change HERE). This is meant as a temporary location, likely to be overwritten soon after. ) -: S" IMMEDIATE ( -- len addr ) +: S" IMMEDIATE ( -- addr len ) STATE @ IF ( compiling? ) ' LITSTRING , ( compile LITSTRING ) HERE @ ( save the address of the length word on the stack ) @@ -415,6 +479,7 @@ DROP ( drop the final " character ) HERE @ - ( calculate the length ) HERE @ ( push the start address ) + SWAP ( addr len ) THEN ; @@ -426,8 +491,8 @@ In immediate mode we just keep reading characters and printing them until we get to the next double quote. - In compile mode we use S" to store the string, then add EMITSTRING afterwards: - LITSTRING EMITSTRING + In compile mode we use S" to store the string, then add TELL afterwards: + LITSTRING TELL It may be interesting to note the use of [COMPILE] to turn the call to the immediate word S" into compilation of that word. It compiles it into the definition of .", @@ -437,7 +502,7 @@ : ." IMMEDIATE ( -- ) STATE @ IF ( compiling? ) [COMPILE] S" ( read the string, and compile LITSTRING, etc. ) - ' EMITSTRING , ( compile the final EMITSTRING ) + ' TELL , ( compile the final TELL ) ELSE ( In immediate mode, just read characters and print them until we get to the ending double quote. ) @@ -547,7 +612,7 @@ is the natural size for integers on this machine architecture. On this 32 bit machine therefore CELLS just multiplies the top of stack by 4. ) -: CELLS ( n -- n ) 4 * ; +: CELLS ( n -- n ) 4* ; ( So now we can define VARIABLE easily in much the same way as CONSTANT above. Refer to the @@ -731,6 +796,9 @@ ( DUMP is used to dump out the contents of memory, in the 'traditional' hexdump format. + + Notice that the parameters to DUMP (address, length) are compatible with string words + such as WORD and S". ) : DUMP ( addr len -- ) BASE @ ROT ( save the current BASE at the bottom of the stack ) @@ -765,7 +833,7 @@ DUP 32 128 WITHIN IF ( 32 <= c < 128? ) EMIT ELSE - DROP [ CHAR ? ] LITERAL EMIT + DROP '.' EMIT THEN 1+ SWAP 1- ( addr len linelen addr -- addr len addr+1 linelen-1 ) REPEAT @@ -785,6 +853,227 @@ BASE ! ( restore saved BASE ) ; +( + CASE...ENDCASE is how we do switch statements in FORTH. There is no generally + agreed syntax for this, so I've gone for the syntax mandated by the ISO standard + FORTH (ANS-FORTH). + + ( some value on the stack ) + CASE + test1 OF ... ENDOF + test2 OF ... ENDOF + testn OF ... ENDOF + ... ( default case ) + ENDCASE + + The CASE statement tests the value on the stack by comparing it for equality with + test1, test2, ..., testn and executes the matching piece of code within OF ... ENDOF. + If none of the test values match then the default case is executed. Inside the ... of + the default case, the value is still at the top of stack (it is implicitly DROP-ed + by ENDCASE). When ENDOF is executed it jumps after ENDCASE (ie. there is no "fall-through" + and no need for a break statement like in C). + + The default case may be omitted. In fact the tests may also be omitted so that you + just have a default case, although this is probably not very useful. + + An example (assuming that 'q', etc. are words which push the ASCII value of the letter + on the stack): + + 0 VALUE QUIT + 0 VALUE SLEEP + KEY CASE + 'q' OF 1 TO QUIT ENDOF + 's' OF 1 TO SLEEP ENDOF + ( default case: ) + ." Sorry, I didn't understand key <" DUP EMIT ." >, try again." CR + ENDCASE + + (In some versions of FORTH, more advanced tests are supported, such as ranges, etc. + Other versions of FORTH need you to write OTHERWISE to indicate the default case. + As I said above, this FORTH tries to follow the ANS FORTH standard). + + The implementation of CASE...ENDCASE is somewhat non-trivial. I'm following the + implementations from here: + http://www.uni-giessen.de/faq/archiv/forthfaq.case_endcase/msg00000.html + + The general plan is to compile the code as a series of IF statements: + + CASE (push 0 on the immediate-mode parameter stack) + test1 OF ... ENDOF test1 OVER = IF DROP ... ELSE + test2 OF ... ENDOF test2 OVER = IF DROP ... ELSE + testn OF ... ENDOF testn OVER = IF DROP ... ELSE + ... ( default case ) ... + ENDCASE DROP THEN [THEN [THEN ...]] + + The CASE statement pushes 0 on the immediate-mode parameter stack, and that number + is used to count how many THEN statements we need when we get to ENDCASE so that each + IF has a matching THEN. The counting is done implicitly. If you recall from the + implementation above of IF, each IF pushes a code address on the immediate-mode stack, + and these addresses are non-zero, so by the time we get to ENDCASE the stack contains + some number of non-zeroes, followed by a zero. The number of non-zeroes is how many + times IF has been called, so how many times we need to match it with THEN. + + This code uses [COMPILE] so that we compile calls to IF, ELSE, THEN instead of + actually calling them while we're compiling the words below. + + As is the case with all of our control structures, they only work within word + definitions, not in immediate mode. +) +: CASE IMMEDIATE + 0 ( push 0 to mark the bottom of the stack ) +; + +: OF IMMEDIATE + ' OVER , ( compile OVER ) + ' = , ( compile = ) + [COMPILE] IF ( compile IF ) + ' DROP , ( compile DROP ) +; + +: ENDOF IMMEDIATE + [COMPILE] ELSE ( ENDOF is the same as ELSE ) +; + +: ENDCASE IMMEDIATE + ' DROP , ( compile DROP ) + + ( keep compiling THEN until we get to our zero marker ) + BEGIN + ?DUP + WHILE + [COMPILE] THEN + REPEAT +; + +( + CFA> is the opposite of >CFA. It takes a codeword and tries to find the matching + dictionary definition. + + In this FORTH this is not so easy. In fact we have to search through the dictionary + because we don't have a convenient back-pointer (as is often the case in other versions + of FORTH). + + This word returns 0 if it doesn't find a match. +) +: CFA> + LATEST @ ( start at LATEST dictionary entry ) + BEGIN + DUP 0<> ( while link pointer is not null ) + WHILE + DUP >CFA ( cfa curr curr-cfa ) + 2 PICK ( cfa curr curr-cfa cfa ) + = IF ( found a match? ) + NIP ( leave curr dictionary entry on the stack ) + EXIT ( and return from the function ) + THEN + @ ( follow link pointer back ) + REPEAT + 2DROP ( restore stack ) + 0 ( sorry, nothing found ) +; + +( + SEE disassembles a FORTH word. + + We search for the dictionary entry of the word, then search again for the next + word (effectively, the end of the compiled word). This results in two pointers: + + +---------+---+---+---+---+------------+------------+------------+------------+ + | LINK | 3 | T | E | N | DOCOL | LIT | 10 | EXIT | + +---------+---+---+---+---+------------+------------+------------+------------+ + ^ ^ + | | + Start of word End of word + + With this information we can have a go at decompiling the word. We need to + recognise "meta-words" like LIT, LITSTRING, BRANCH, etc. and treat those separately. +) +: SEE + WORD FIND ( find the dictionary entry to decompile ) + + ( now we search again, looking for the next word ) + HERE @ ( address of the end of the last compiled word ) + LATEST @ ( word last curr ) + BEGIN + 2 PICK ( word last curr word ) + OVER ( word last curr word curr ) + <> ( word last curr word<>curr? ) + WHILE ( word last curr ) + NIP ( word curr ) + DUP @ ( word curr prev (which becomes: word last curr) ) + REPEAT + + DROP ( at this point, the stack is: start-of-word end-of-word ) + SWAP ( end-of-word start-of-word ) + + ( begin the definition with : NAME [IMMEDIATE] ) + ':' EMIT SPACE DUP ID. SPACE + DUP ?IMMEDIATE IF ." IMMEDIATE " THEN + + >DFA ( get the data address, ie. points after DOCOL | end-of-word start-of-data ) + + ( now we start decompiling until we hit the end of the word ) + BEGIN ( end start ) + 2DUP > + WHILE + DUP @ ( end start codeword ) + + CASE + ' LIT OF ( is it LIT ? ) + 4 + DUP @ ( get next word which is the integer constant ) + . ( and print it ) + ENDOF + ' LITSTRING OF ( is it LITSTRING ? ) + [ CHAR S ] LITERAL EMIT '"' EMIT SPACE ( print S" ) + 4 + DUP @ ( get the length word ) + SWAP 4 + SWAP ( end start+4 length ) + 2DUP TELL ( print the string ) + '"' EMIT SPACE ( finish the string with a final quote ) + + ALIGNED ( end start+4+len, aligned ) + 4 - ( because we're about to add 4 below ) + ENDOF + ' 0BRANCH OF ( is it 0BRANCH ? ) + ." 0BRANCH ( " + 4 + DUP @ ( print the offset ) + . + ')' EMIT SPACE + ENDOF + ' BRANCH OF ( is it BRANCH ? ) + ." BRANCH ( " + 4 + DUP @ ( print the offset ) + . + ')' EMIT SPACE + ENDOF + ' ' OF ( is it ' (TICK) ? ) + [ CHAR ' ] LITERAL EMIT SPACE + 4 + DUP @ ( get the next codeword ) + CFA> ( and force it to be printed as a dictionary entry ) + ID. SPACE + ENDOF + ' EXIT OF ( is it EXIT? ) + ( We expect the last word to be EXIT, and if it is then we don't print it + because EXIT is normally implied by ;. EXIT can also appear in the middle + of words, and then it needs to be printed. ) + 2DUP ( end start end start ) + 4 + ( end start end start+4 ) + <> IF ( end start | we're not at the end ) + ." EXIT " + THEN + ENDOF + ( default case: ) + DUP ( in the default case we always need to DUP before using ) + CFA> ( look up the codeword to get the dictionary entry ) + ID. SPACE ( and print it ) + ENDCASE + + 4 + ( end start+4 ) + REPEAT + + ';' EMIT CR + + 2DROP ( restore stack ) +; + ( Finally print the welcome prompt. ) ." JONESFORTH VERSION " VERSION . CR ." OK "