ARM assembler overview

The latest ARM version of uLisp allows you to generate machine-code functions, integrated with Lisp, written in ARM Thumb code. It has the following features:

  • You can create multiple named machine-code functions, limited only by the amount of code memory available.
  • Machine-code functions are created with a defcode special form, which has a similar syntax to defun.
  • You can include labels in your assembler listing simply by including them as symbols in the body of the defcode form. The defcode form creates these as local variables.
  • The defcode form automatically does a two-pass assembly to resolve forward references, used in branches and memory references.
  • The defcode form generates an assembler listing, showing the mnemonics and the machine-code generated from them.
  • The machine-code functions are saved with save-image, and restored with load-image.

The assembler itself is written in Lisp to make it easy to extend it or add new instructions. For example, you could write assembler macros in Lisp. It will fit on most ARM boards, including SAMD21 and SAMD51 boards. The assembler uses only supports Thumb-1 instructions, and so is compatible with M0 ARM processors or higher.

Get the latest version of the assembler here: ARM assembler in uLisp.

To add it to uLisp: do Select All and Copy, Paste it into the field at the top of the Arduino IDE Serial Monitor window, and press Return. Or you could load it from an SD card.

References

For a summary of the RISC-V assembler instructions see ARM assembler instructions.

For some more complex examples see ARM assembler examples.

For an explanation of how the ARM version of the assembler works see ARM assembler written in Lisp.

For the RISC-V Instruction Set Manual see The RISC-V Instruction Set Manual on riscv.org.

The defcode form

The assembler uses a special defcode form to generate machine-code functions.

defcode special form

Syntax: (defcode name (parameters) form*)

The defcode form is similar in syntax to defun. It creates a named machine-code function from a series of 16-bit integers given in the body of the form. These are written into RAM, and can be executed by calling the function in the same way as a normal Lisp function.

For example:

(defcode mul13 (x) #x210d #x4348 #x4770)

creates a machine-code routine called mul13, with one parameter, consisting of three instructions which multiplies its single integer argument by 13. For example:

> (mul13 10)
130

If you specify the machine code instructions as constants, as in the above example, you don't need to load the ARM assembler.

Calling convention

Functions defined with defcode can take up to four parameters. These are passed to the machine-code routine in the registers r0 to r3 respectively. The symbols used for the four parameters can be used as synonyms for the corresponding register r0 to r3 in the body of the defcode form.

If a parameter is an integer its value is passed in the corresponding register; otherwise the address of the parameter is passed in the corresponding register. For examples showing how to access a list in a machine-code routine see ARM assembler examples - List examples.

The machine-code function should return the result back to uLisp in r0. This is returned as an integer.

Call-clobbered registers

The best registers to use in assembler functions are r0 to r3, r12, and r14 (lr) if you are not calling another subroutine. These are call clobbered; a function may use them without restoring the contents.

Call-saved registers

If you use r4 to r11 you must restore their original contents.

Assembler

Although you can supply machine-code instructions as hexadecimal op-codes, the assembler is more convenient as it allows you to write machine-code functions in ARM Thumb mnemonics. It is written in uLisp.

Assembler syntax

Where possible the syntax is very similar to ARM assembler syntax, with the following differences:

  • The mnemonics are prefixed by '$' (because some mnemonics such as push and pop are already in use as Lisp functions).
  • For simplicity the mnemonics don't include the 'S' suffix, added to the Thumb assembler syntax on the release of Thumb-2 to indicate whether an instruction affects the condition codes. For details of which instructions affect the condition codes see ARM assembler instructions.
  • Registers are represented as symbols, prefixed with a quote. Constants are just numbers.
  • Lists of registers, as used in the $push and $pop mnemonics, are represented as a Lisp list.

Assembler instructions are just Lisp functions, so you can see the code they generate:

> ($mov 'r1 13)
8461

The assembler includes a function x16 to print a 16-bit value in hexadecimal, so you can see the result in hexadecimal by writing:

> (x16 ($mov 'r1 13))
#x210d

The following table shows typical ARM assembler formats, and the equivalent in this Lisp assembler:

Examples ARM assembler uLisp assembler
Push and pop push  {r4, r5, r6, lr} ($push '(r4 r5 r6 lr))
Registers subs  r1, r2, r3 ($sub 'r1 'r2 'r3)
Immediate mov r2, #3 ($mov 'r2 3)
Load relative ldr  r0, [r3, #0] ($ldr 'r0 '(r3 0))
Load in-line constant ldr  r0, label ($ldr 'r0 label)
Branch bne label ($bne label)
Constant .word  0x0f0f0f0f ($word #x0f0f0f0f)

Note that the order of the registers in the list supplied to $push and $pop is irrelevant; the registers are always pushed in the order highest number first to lowest last, and popped in the order lowest number first to highest last.

Simple example

Here's a simple example consisting of three ARM Thumb instructions that multiplies its parameter by 13 and returns the result:

(defcode mul13 (x)
  ($mov 'r1 13)
  ($mul 'r0 'r1)
  ($bx 'lr))

Evaluating this generates an assembler listing as follows:

0000 210d ($mov 'r1 13)
0002 4348 ($mul 'r0 'r1)
0004 4770 ($bx 'lr)
We can then call the function as follows:
> (mul13 11)
143

The result is the number returned in the r0 register.

Note that functions written using defcode can't be relied upon to have a fixed position in memory and so should be position independent, and use only relative branches and memory references within the machine-code function.

Labels

You can include symbols in the body of the defcode form to create labels. The defcode assembler automatically creates these as local variables, and then does a two-pass assembly to resolve forward references. The assembler can then access these variables to calculate the offsets in branches and pc-relative addressing.

Note also that because uLisp requires comments starting with a semi-colon to be terminated by an open parenthesis, you can't put a comment immediately before a label. This is a limitation because the Arduino Serial Monitor removes all line break characters. You can use bracketing comments instead:

#| This is a comment |#

For example, here's a simple routine to calculate the Greatest Common Divisor of its two arguments, which uses two labels:

; Greatest Common Divisor
(defcode gcd (x y)
  swap
  ($mov 'r2 'r1)
  ($mov 'r1 'r0)
  again
  ($mov 'r0 'r2)
  ($sub 'r2 'r2 'r1)
  ($blt swap)
  ($bne again)
  ($bx 'lr))

Evaluating this form generates the following assembler listing:

0000      swap
0000 000a ($mov 'r2 'r1)
0002 0001 ($mov 'r1 'r0)
0004      again
0004 0010 ($mov 'r0 'r2)
0006 1a52 ($sub 'r2 'r2 'r1)
0008 dbfa ($blt swap)
000a d1fb ($bne again)
000c 4770 ($bx 'lr)

For example, to find the GCD of 3287 and 3460:

> (gcd 3287 3460)
173

In-line constants

You can insert an in-line 32-bit constant with the $word function. This is often used in conjunction with the $ldr mnemonic to load a 32-bit constant into a register. The assembler automatically inserts a $nop mnemonic, if necessary, to align the constant on a four-byte boundary as required by the ARM processor.

The following example loads 1234567890 into r0 and returns it:

(defcode constant ()
  ($ldr 'r0 const)
  ($bx 'lr)
  const
  ($word 1234567890))

The result:

> (constant)
1234567890

For more examples see ARM assembler examples.