A Lisp compiler to ARM written in Lisp
21st August 2024
This is a simple experimental Lisp compiler, written in uLisp, that will compile a Lisp function into ARM machine code. My aim was to make the compiler simple enough so that its listing will fit on a single sheet of A4 paper.
Updates
21st August 2024: This article describes Version 2 of the compiler which has the following improvements over the original, while keeping the size of the program unchanged:
- It compiles expressions to leave the result in a register (r0) rather than push it on the stack, which results in more efficient code.
- It includes support for the following additional Lisp functions: 1+, 1-, car, and cdr.
- There is better error checking.
I've also included some additional example programs.
22nd August 2024: Added an appendix showing the code generated by compiling the rec example.
Introduction
When I added the facility of executing machine code to uLisp I had in mind the eventual goal of being able to compile uLisp functions into machine code, and this is a first step in that direction.
The nice thing about compiling Lisp is that you don't have to write a tokeniser or parser, because Lisp programs are already in a consistent structure that can be processed by another Lisp program.
The compiler program is written in the subset of Common Lisp supported by uLisp, and will run on an ARM board with at least 5000 objects of workspace. I used a Seeed Studio XIAO nRF52840. You can also run it using Common Lisp on a laptop or desktop computer, and display the code it generates, but of course you won't be able to run the ARM machine code because Common Lisp doesn't have uLisp's defcode command.
Although I wrote this for ARM boards it should be relatively straightforward to convert it to generate AVR or RISC-V machine code; see AVR assembler overview or RISC-V assembler overview.
I got my initial inspiration for this compiler from Peter Norvig's book "Paradigms of Artificial Intelligence Programming" [1].
Note: The compiler uses only Thumb-1 instructions and so will run on ARM processors down to M0, such as used on the Arduino Zero and BBC Micro:bit.
Get the full source of the compiler here: Lisp compiler for ARM.
Or from GitHub here: https://github.com/technoblogy/lisp-arm-compiler.
Using the compiler
To use this compiler you simply call compile on a Lisp function; for example:
(compile 'fibonacci)
The function will be compiled into a machine code function, replacing the original Lisp code, so that calling fibonacci will now execute the ARM machine-code version.
You can also display the code generated for an expression by calling comp on the expression; for example:
(pprint (comp '(* 13 17))) (:integer ($mov 'r0 13) ($push '(r0)) ($mov 'r0 17) ($pop '(r1)) ($mul 'r0 'r1))
I give examples of several simple Lisp programs that it will successfully compile later in this article, together with a comparison of the speed of the Lisp and machine-code versions.
Specification
The compiler understands the following Lisp commands:
Defining variables and functions: defun, setq
Symbols: nil, t
List functions: car, cdr
Arithmetic functions: +, -, *, 1+, 1-, logand, logior, logxor
Arithmetic comparisons: =, <, <=, >, >=, /=
Conditionals: if, and, or
Evaluation: progn
Limitations
This first version of the compiler has a few limitations, including:
- It can only compile functions of up to four integer arguments, and returning an integer result.
- It assumes that the function you are compiling is small enough so that branch instructions can be used for jumps within the function.
- There is relatively little error checking.
- It generates inefficient code compared to what you could write by hand.
Some of these limitations would be simple to remedy; some of them are more fundamental.
To use the compiler you first need to load the ARM assembler, written in Lisp, from: ARM assembler in uLisp.
For more information about the assembler see ARM assembler overview.
How the compiler works
Register usage
To avoid needing to keep track of register usage the compiler makes use of the stack to pass values to an expression, and store the value returned by an expression.
The following table shows how the ARM registers are used within the compiler:
Registers | Use |
r0 r1 r2 r3 | Used to pass the parameters to the main function's arguments. |
r0 | Contains the value returned by the main function. |
r4 r5 r6 r7 | Contain copies of the function arguments within the function. |
r0 r1 | Used to pass the arguments to each operator. |
r0 | Used to return the value from each operator. |
r2 | Used to return the true/nil value from comparisons. |
Compiling an expression
The following steps show the sequence of compiling an expression, such as:
(* x 13)
- Code is generated to evaluate each of the arguments, in this case x and 13, and each result is pushed onto the stack, apart from the last which is left in r0.
- The first value is popped from the stack into register r1.
- The function, in this case *, is then evaluated for r1 and r0, with the result in r0.
This stack-based approach ensures that a more complex expression, such as:
(* (- x 1) (+ x 13))
will also compile into correct code, without conflicts between registers.
Calling the function recursively
The compiler supports calling a function recursively from within the function itself. Because the registers corresponding to the parameters and local variables would be overwritten by the recursive call they are stored on the stack around the function call.
There are several recursive functions in the examples below.
Types
For boolean operations I decided to represent nil as zero, and t as 1. A problem I hadn't anticipated was that I would need to keep track of what type of object each function returned, integer or boolean. For example, consider the problem of compiling the statement:
(and x y)
If x has the value 0 and y has the value 7 this should return 7. However, if x has the value nil and y has the value 7 this should return nil. Representing nil as zero leads to an ambiguity.
I solved this by returning a type, :integer or :boolean, with each compiled expression, according to the following rules:
- Predicates, and t or nil, always return a :boolean.
- Arithmetic operations always return an :integer.
- An if form requires a :boolean test form and returns an :integer.
- A progn or let block returns the type of its last expression.
An item with an ambiguous type returns the type nil.
Examples
I used the following simple examples to test the compiler. Before compiling a new function you might want to remove the previous one from memory using makunbound to free up the code memory before compiling the next function; for example:
(makunbound 'fibonacci)
Alternatively you could increase the amount of memory available for machine code by editing the directive such as:
#define CODESIZE 256
before uploading uLisp to your board.
You can compare the times for the compiled versions with my hand-coded versions of some of these functions in ARM assembler examples.
Integer examples
The following examples take integer arguments and return an integer result.
Takeuchi function
This is a version of the highly-recursive benchmark I use for comparing versions of Lisp [2]:
(defun tak (x y z) (if (>= y x) z (tak (tak (1- x) y z) (tak (1- y) z x) (tak (1- z) x y))))
Lisp version:
> (time (tak 18 12 6)) 7 Time: 8.9 s
Compiled version
> (time (tak 18 12 6)) 7 Time: 54 ms
Factorial
This is a recursive implementation of the factorial function:
(defun fact (n) (if (<= n 1) 1 (* n (fact (- n 1)))))
Lisp version:
> (time (fact 12)) 479001600 Time: 2 ms
Compiled version
> (time (fact 12)) 479001600 Time: 0 ms
Fibonacci
This is a recursive implementation of the Fibonacci series:
(defun fibonacci (n) (if (< n 3) 1 (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))
Lisp version:
> (time (fibonacci 27))
196418 Time: 50.5 s
Compiled version
> (time (fibonacci 27))
196418 Time: 287 ms
Greatest Common Divisor
A recursive algorithm to calculate the greatest common divisor of two integers.
(defun gcd (x y) (if (= x y) x (if (> x y) (gcd (- x y) y) (gcd x (- y x)))))
Lisp version:
> (time (gcd 1464441269 1280682121))
307 Time: 11 ms
Compiled version
> (time (gcd 1464441269 1280682121)) 307 Time: 1 ms
Hofstadter Q sequence
This is one of several recursive sequences described in Douglas Hofstadter's book "Gödel, Escher, Bach: an Eternal Golden Braid". It is defined as follows:
(defun q (n) (if (<= n 2) 1 (+ (q (- n (q (- n 1)))) (q (- n (q (- n 2)))))))
It is related to the Fibonacci sequence, except that in this case the two preceding terms specify how far to go back in the sequence to find the two terms to be summed.
Lisp version:
> (time (q 21)) 12 Time: 14.5 s
Compiled version
> (time (q 21)) 12 Time: 85 ms
Two-dimensional recursive function Q2
This function Q2 is my two-dimensional extension of the Hofstadter Q sequence [3]:
(defun q2 (x y) (if (or (< x 1) (< y 1)) 1 (+ (q2 (- x (q2 (1- x) y)) y) (q2 x (- y (q2 x (1- y)))))))
Lisp version:
> (time (q2 7 8)) 31 Time: 26.3 s
Compiled version
> (time (q2 7 8)) 31 Time: 177 ms
A simple recursive function - rec
This is related to the factorial function:
(defun rec (n) (1+ (* n (if (= n 0) 0 (rec (1- n))))))
Lisp version:
> (time (rec 12)) 1302061345 Time: 2 ms
Compiled version
> (time (rec 12)) 1302061345 Time: 0 ms
Number of combinations - nCr
This is a simple but inefficient way of recursively calculating nCr, based on Pascal's Triangle:
(defun ncr (n r) (if (or (= r 0) (= r n)) 1 (+ (ncr (1- n) (1- r)) (ncr (1- n) r))))
For example, to calculate the number of possible poker hands from a pack of cards:
Lisp version:
> (time (ncr 52 5)) 2598960 Time: 985.2 s
Compiled version
> (time (ncr 52 5)) 2598960 Time: 5.8 s
List examples
Any of the arguments to a machine-code function can be a list, in which case the address of the list is passed to the routine in the corresponding parameter. You can then use the functions car and cdr to process the elements in the list.
Dot product
This recursive function calculates the dot product of two vectors:
(defun dot-product (a b) (if (and a b) (+ (* (car a) (car b)) (dot-product (cdr a) (cdr b)))
0))
For example, to calculate the following dot product:
(987 654 321) • (963 852 741) = 987 × 963 + 654 × 852 + 321 × 741 = 1745550
Lisp version:
> (time (dot-product '(987 654 321) '(963 852 741)))
1745550 Time: 1 ms
Compiled version
> (time (dot-product '(987 654 321) '(963 852 741)))
1745550 Time: 1 ms
Compiler source
Here's the source of the compiler, with comments.
Invoking the compiler
To compile a Lisp function you simply give the command compile followed by the name of the function; for example, to compile the fibonacci function:
(compile 'fibonacci)
Here's the definition of the command compile:
(defun compile (name) (if (eq (car (eval name)) 'lambda) (eval (comp (cons 'defun (cons name (cdr (eval name)))))) "Not a Lisp function"))
Main compiler function
The main function comp returns the compiled code for an expression or form, as a list of assembler instructions prefixed by the type, :integer or :boolean:
(defun comp (x &optional env) (cond ((null x) (type-code :boolean '(($mov 'r0 0)))) ((eq x t) (type-code :boolean '(($mov 'r0 1)))) ((symbolp x) (comp-symbol x env)) ((atom x) (type-code :integer (list (list '$mov ''r0 x)))) (t (let ((fn (first x)) (args (rest x))) (case fn (defun (setq *label-num* 0) (setq env (mapcar #'(lambda (x y) (cons x y)) (second args) *locals*)) (comp-defun (first args) (second args) (cddr args) env)) (progn (comp-progn args env)) (if (comp-if (first args) (second args) (third args) env)) (setq (comp-setq args env)) (t (comp-funcall fn args env)))))))
Each of the different types of form are handled by separate functions such as comp-defun, comp-if, and comp-progn.
Utilities
The compiler uses the following utility functions:
The function mappend applies a function to the elements of a list, and the results, which should be lists, are appended together:
(defun mappend (fn lst) (apply #'append (mapcar fn lst)))
The function type-code adds a code type label to the front of a list of assembler instructions, and the functions code-type and code return the code type label, and the code list, respectively:
(defun type-code (type code) (cons type code)) (defun code-type (type-code) (car type-code)) (defun code (type-code) (cdr type-code))
The function checktype gives an error if the value returned is not the correct type:
(defun checktype (fn type check) (unless (or (null type) (null check) (eq type check)) (error "Argument to '~a' must be ~a not ~a~%" fn check type)))
The lists *params* and *locals* list the registers available for use in the compiler:
(defvar *params* '(r0 r1 r2 r3)) (defvar *locals* '(r4 r5 r6 r7))
Finally, gen-label generates a new label:
(defvar *label-num* 0) (defun gen-label () (read-from-string (format nil "lab~d" (incf *label-num*))))
The remaining functions handle the compiling of specific types of Lisp form:
Symbols
The environment is represented by an association list giving the register associated with each variable, such as:
((y . r5) (x . r4))
The function comp-symbol looks up a symbol in the association list and returns the appropriate register:
(defun comp-symbol (x env) (let ((reg (cdr (assoc x env)))) (type-code nil (list (list '$mov ''r0 (list 'quote reg))))))
Assignment
The function comp-setq handles assignment to a variable:
(defun comp-setq (args env) (let ((value (comp (second args) env)) (reg (cdr (assoc (first args) env)))) (type-code (code-type value) (append (code value) (list '($pop '(r0)) (list '$mov (list 'quote reg) ''r0))))))
Function definition
The definition of the function being compiled is handled by comp-defun:
(defun comp-defun (name args body env) (let ((used (subseq *locals* 0 (length args)))) (append (list 'defcode name args) (list name (list '$push (list 'quote (cons 'lr (reverse used))))) (apply #'append (mapcar #'(lambda (x y) (list (list '$mov (list 'quote x) (list 'quote y)))) used *params*)) (code (comp-progn body env)) (list (list '$pop (list 'quote (append used (list 'pc))))))))
Progn special form
The function comp-progn compiles a progn form:
(defun comp-progn (exps env) (let* ((len (1- (length exps))) (nlast (subseq exps 0 len)) (last1 (nth len exps)) (start (mappend #'(lambda (x) (append (code (comp x env)))) nlast)) (end (comp last1 env))) (type-code (code-type end) (append start (code end)))))
It compiles code to evaluate each expression in the body of the progn, discarding all but the last results, and returns the type of the last form as the type of the whole block.
If special form
The function comp-if compiles the code for an if special form:
(defun comp-if (pred then else env) (let ((lab1 (gen-label)) (lab2 (gen-label)) (test (comp pred env))) (checktype 'if (car test) :boolean) (type-code :integer (append (code test) (list '($cmp 'r0 0) (list '$beq lab1)) (code (comp then env)) (list (list '$b lab2) lab1) (code (comp else env)) (list lab2)))))
Function calls
Finally, comp-funcall compiles code for function calls to the built-in functions, or a recursive call to the main function:
(defun comp-funcall (f args env) (let ((test (assoc f '((> . $bgt) (>= . $bge) (= . $beq) (<= . $ble) (< . $blt) (/= . $bne)))) (logical (assoc f '((and . $and) (or . $orr)))) (arith1 (assoc f '((1+ . $add) (1- . $sub)))) (arith+- (assoc f '((+ . $add) (- . $sub)))) (arith2 (assoc f '((* . $mul) (logand . $and) (logior . $orr) (logxor . $eor))))) (cond (test (let ((label (gen-label))) (type-code :boolean (append (comp-args f args 2 :integer env) (list '($pop '(r1)) '($mov 'r2 1) '($cmp 'r1 'r0) (list (cdr test) label) '($mov 'r2 0) label '($mov 'r0 'r2)))))) (logical (type-code :boolean (append (comp-args f args 2 :boolean env) (list '($pop '(r1)) (list (cdr logical) ''r0 ''r1))))) (arith1 (type-code :integer (append (comp-args f args 1 :integer env) (list (list (cdr arith1) ''r0 1))))) (arith+- (type-code :integer (append (comp-args f args 2 :integer env) (list '($pop '(r1)) (list (cdr arith+-) ''r0 ''r1 ''r0))))) (arith2 (type-code :integer (append (comp-args f args 2 :integer env) (list '($pop '(r1)) (list (cdr arith2) ''r0 ''r1))))) ((member f '(car cdr)) (type-code :integer (append (comp-args f args 1 :integer env) (if (eq f 'cdr) (list '($ldr 'r0 '(r0 4))) (list '($ldr 'r0 '(r0 0)) '($ldr 'r0 '(r0 4))))))) (t ; function call (type-code :integer (append (comp-args f args nil :integer env) (when (> (length args) 1) (append (list (list '$mov (list 'quote (nth (1- (length args)) *params*)) ''r0)) (mappend #'(lambda (x) (list (list '$pop (list 'quote (list x))))) (reverse (subseq *params* 0 (1- (length args))))))) (list (list '$bl f))))))))
It uses the routine comp-args to generate code to compile each of the arguments to a function:
(defun comp-args (fn args n type env) (unless (or (null n) (= (length args) n)) (error "Incorrect number of arguments to '~a'~$" fn)) (let ((n (length args))) (mappend #'(lambda (y) (let ((c (comp y env))) (decf n) (checktype fn type (code-type c)) (if (zerop n) (code c) (append (code c) '(($push '(r0))))))) args)))
Get the full source of the compiler here: Lisp compiler for ARM.
Or from GitHub here: https://github.com/technoblogy/lisp-arm-compiler.
Appendix
As an example, here's the code generated by compiling the rec example:
> (defun rec (n) (1+ (* n (if (= n 0) 0 (rec (1- n)))))) rec > (compile 'rec) 0000 rec 0000 b510 ($push '(lr r4)) 0002 0004 ($mov 'r4 'r0) 0004 0020 ($mov 'r0 'r4) 0006 b401 ($push '(r0)) 0008 0020 ($mov 'r0 'r4) 000a b401 ($push '(r0)) 000c 2000 ($mov 'r0 0) 000e bc02 ($pop '(r1)) 0010 2201 ($mov 'r2 1) 0012 4281 ($cmp 'r1 'r0) 0014 d000 ($beq lab3) 0016 2200 ($mov 'r2 0) 0018 lab3 0018 0010 ($mov 'r0 'r2) 001a 2800 ($cmp 'r0 0) 001c d001 ($beq lab1) 001e 2000 ($mov 'r0 0) 0020 e003 ($b lab2) 0022 lab1 0022 0020 ($mov 'r0 'r4) 0024 3801 ($sub 'r0 1) 0026 f7ff ($bl rec) 0028 ffeb 002a lab2 002a bc02 ($pop '(r1)) 002c 4348 ($mul 'r0 'r1) 002e 3001 ($add 'r0 1) 0030 bd10 ($pop '(r4 pc)) rec > (rec 12) 1302061345
- ^ Norvig, Peter "Paradigms of Artificial Intelligence Programming" Morgan Kaufmann Publishers, Inc, San Francisco, 1992, pp 784-833, available as a PDF paip-lisp on GitHub.
- ^ Benchmarks - Takeuchi function.
- ^ Benchmarks - Two-dimensional recursive function Q2.