RISC-V assembler overview
The RISC-V version of uLisp allows you to generate machine-code functions, integrated with Lisp, written in RISC-V code. It supports the RISC-V core on the Raspberry Pi Pico 2 board, and the Sipeed MAiX RISC-V boards.
The RISC-V uLisp assembler has the following features:
- You can create multiple named machine-code functions, limited only by the amount of code memory available.
- Machine-code functions are created with a defcode special form, which has a similar syntax to defun.
- You can include labels in your assembler listing simply by including them as symbols in the body of the defcode form. The defcode form creates these as local variables.
- The defcode form automatically does a two-pass assembly to resolve forward references, used in branches and memory references.
- The defcode form generates an assembler listing, showing the mnemonics and the machine-code generated from them.
- The machine-code functions are saved with save-image, and restored with load-image.
The assembler itself is written in Lisp to make it easy to extend it or add new instructions. For example, you could add support for RISC-V floating-point instructions.
Get the assembler here: RISC-V assembler in uLisp.
To add the assembler to uLisp: do Select All and Copy, Paste it into the field at the top of the Arduino IDE Serial Monitor window, and press Return. Or you could load it from an SD card.
For a summary of the RISC-V assembler instructions see RISC-V assembler instructions.
For some more complex examples see RISC-V assembler examples.
Saving an image
Once you have loaded the assembler, you can save the uLisp image to an SD card using:
(save-image)
In future you can then simply reload it using:
(load-image)
References
For a summary of the RISC-V assembler instructions see RISC-V assembler instructions.
For some more complex examples see RISC-V assembler examples.
For the RISC-V Instruction Set Manual see The RISC-V Instruction Set Manual on riscv.org.
The defcode form
The assembler uses a special defcode form to generate machine-code functions.
defcode special form
Syntax: (defcode name (parameters) form*)
The defcode form is similar in syntax to defun. It creates a named machine-code function from a series of 16-bit integers given in the body of the form. These are written into RAM, and can be executed by calling the function in the same way as a normal Lisp function.
For example:
(defcode mul13 (x) #x45b5 #x0533 #x02b5 #x8082)
creates a machine-code routine called mul13, with one parameter, consisting of three instructions which multiplies its single integer argument by 13. For example:
> (mul13 10) 130
If you specify the machine code instructions as constants, as in the above example, you don't need to load the RISC‑V assembler.
Calling convention
Functions defined with defcode can take up to four parameters. These are passed to the machine-code routine in the registers a0 to a3 respectively. The symbols used for the four parameters can be used as synonyms for the corresponding register a0 to a3 in the body of the defcode form.
If a parameter is an integer its value is passed in the corresponding register; otherwise the address of the parameter is passed in the corresponding register. For examples showing how to access a list in a machine-code routine see RISC-V assembler examples - List examples.
The machine-code function should return the result back to uLisp in a0. This is returned as an integer.
Saved registers
The best registers to use in assembler functions are a0 to a7 and s0 to s11. These are saved across function calls.
Assembler
Although you can supply machine-code instructions as hexadecimal op-codes, the assembler is more convenient as it allows you to write machine-code functions in RISC-V mnemonics. It is written in uLisp.
Assembler syntax
Where possible the syntax is very similar to RISC-V assembler syntax, with the following differences:
- The mnemonics are prefixed by '$' (because some mnemonics such as push and pop are already in use as Lisp functions).
- Registers are represented as symbols, prefixed with a quote. Constants are just numbers.
Assembler instructions are just Lisp functions, so you can see the code they generate in hexadecimal by writing:
> (format t "~x" ($li 'a1 13)) 45b5
The following table shows typical RISC-V assembler formats, and the equivalent in this Lisp assembler:
Examples | RISC-V assembler | uLisp assembler |
Registers | mv a1, a2 | ($mv 'a1 'a2) |
Immediate | li a0,2 | ($li 'a0 2) |
Load | lw a0, 8(sp) | ($lw 'a0 8 '(sp)) |
Load in-line constant | ldr r0, label | ($ldr 'r0 label) |
Branch | ble a0, a1, label | ($ble 'a0 'a1 label) |
Jump to subroutine | jal label | ($jal label) |
Simple example
Here's a simple example consisting of three RISC-V instructions that multiplies its parameter by 13 and returns the result:
(defcode mul13 (x) ($li 'a1 13) ($mul 'a0 'a0 'a1) ($ret))
Evaluating this generates an assembler listing as follows:
0000 45b5 ($li 'a1 13) 0002 0533 ($mul 'a0 'a0 'a1) 0004 02b5 0006 8082 ($ret)
> (mul13 11) 143
The result is the number returned in the a0 register.
Note that functions written using defcode can't be relied upon to have a fixed position in memory and so should be position independent, and use only relative branches and memory references within the machine-code function.
Labels
You can include symbols in the body of the defcode form to create labels. The defcode assembler automatically creates these as local variables, and then does a two-pass assembly to resolve forward references. The assembler can then access these variables to calculate the offsets in branches and pc-relative addressing.
Note also that because uLisp requires comments starting with a semi-colon to be terminated by an open parenthesis, you can't put a comment immediately before a label. This is a limitation because the Arduino Serial Monitor removes all line break characters. You can use bracketing comments instead:
#| This is a comment |#
For example, here's a simple routine to calculate the Greatest Common Divisor, which uses two labels:
; Greatest Common Divisor (defcode gcd (a b) swap ($mv 'a2 'a1) ($mv 'a1 'a0) again ($mv 'a0 'a2) ($sub 'a2 'a2 'a1) ($bltz 'a2 swap) ($bnez 'a2 again) ($ret))
Evaluating this form generates the following assembler listing:
0000 swap 0000 862e ($mv 'a2 'a1) 0002 85aa ($mv 'a1 'a0) 0004 again 0004 8532 ($mv 'a0 'a2) 0006 8e0d ($sub 'a2 'a2 'a1) 0008 4ce3 ($bltz 'a2 swap) 000a fe06 000c fe65 ($bnez 'a2 again) 000e 8082 ($ret)
For example, to find the GCD of 3287 and 3460:
> (gcd 3287 3460) 173
For more examples see RISC-V assembler examples.