RISC-V assembler RP2350 extensions
21st October 2024
This article describes functions to add support to the Lisp RISC-V assembler for additional RISC-V instructions provided in the RP2350.
Introduction
The RP2350 Hazard3 RISC-V core designed by Luke Wren extends the base 32-bit RISC-V instruction set with a number of RISC-V extensions. To my mind the most interesting of these to uLisp users are the Zbb, Zbs, and Zbkb extensions which provide bit manipulations and single bit instructions which could be particularly useful in embedded and electronics applications. I've defined an additional RISC-V extensions file to allow you to add support for these to the RISC-V assembler.
Loading the RISC-V extensions
To add the extensions load the standard assembler file first, followed by the extensions file, because some of the extensions add compressed versions of the instructions in the main file.
Get the standard assembler file here: RISC-V assembler in uLisp.
Get the extensions here: RISC-V RP2350 extensions.
Note that these extensions won't work on the Kendryte K210 RISC-V processor used on the Sipeed MAiX boards, which is also supported by the RISC-V assembler in uLisp.
Examples
It's not obvious what some of these extensions might be useful for, so the following examples demonstrate some possible applications:
Reverse bits – brev8 and rev8
This is a function to efficiently reverse the order of bits in a 32-bit number. The reverse-bits operation could be useful when transforming bitmap images, or when interfacing between protocols that work MSB first and LSB first. It takes advantage of the brev8 instruction that reverses the bits within each byte, and the rev8 instruction that reverses the order of the bytes:
(defcode reverse-bits (n) ; Reverse bits within each byte ($brev8 'a0 'a0) ; Reverse all bytes ($rev8 'a0 'a0) ($ret))
For example:
> (format t "~b" (reverse-bits #b10110011100011110000111110000011)) 11000001111100001111000111001101
Maximum number in a list - max
The following example demonstrates the use of the max instruction that returns the maximum of two signed integers. It finds the largest integer in a list of arbitrary length:
(defcode maximum (x) ($lui 'a2 #x80000) repeat ($beqz 'a0 finished) ($lw 'a1 0 '(a0)) ($lw 'a1 4 '(a1)) ($max 'a2 'a1 'a2) ($lw 'a0 4 '(a0)) ($j repeat) finished ($mv 'a0 'a2) ($ret))
For example:
> (maximum '(23 -91 47 -73 11)) 47
It iterates through the list keeping track of the largest value found so far. Obviously you can also use min to find the smallest value.
Integer square root - clz
The new clz instruction counts the number of leading zeros in a register. It provides an easy way of getting upper and lower bounds for the integer part of the square root of a number. These are useful for applications such as finding prime numbers, where the upper bound gives the largest factor you need to test. If a more accurate result is needed, these bounds can be used as the starting point for Newton's method, or a binary search.
The algorithm takes advantage of the fact that the length of the binary representation of a number's integer square root is approximately half that of the original number.
Here is the upper bound routine, upper-sqrt:
(defcode upper-sqrt (x) ($li 'a1 33) ($li 'a2 1) ($clz 'a0 'a0) ($sub 'a0 'a1 'a0) ($srli 'a0 'a0 1) ($sll 'a0 'a2 'a0) ($addi 'a0 'a0 -1) ($ret))
It's equivalent to this Lisp function (assuming you defined clz):
(defun upper-sqrt (x) (1- (ash 1 (truncate (- 33 (clz x)) 2))))
For example:
> (upper-sqrt 9) 3
> (upper-sqrt 1000000) 1023
> (upper-sqrt 1600000000) 65535
To get the lower bound of the integer square root you could use the following Lisp function, lower-sqrt:
(defun lower-sqrt (x) (1- (truncate (+ (upper-sqrt x) 3) 2)))
A compact representation for unsigned integers - clz and ror
A 32-bit unsigned integer has a range of 0 to 232-1 and precision of 1 in 232. Is it possible to devise a more compact 16-bit floating-point format that will represent the same range, but with reduced precision? This might be useful, for example, to log the values from an analogue-to-digital converter with limited storage.
The solution is to normalize the 32-bit unsigned integer, by shifting it left until the most significant bit is a '1'. Then store the number in a 16-bit halfword with the top five bits (E, the exponent) giving the amount of the shift, and the bottom 11 bits (F, the fractional part) giving the top 11 bits of the normalized number [1].
A number N is then: N = (F + #x800) × 2(11-E).
The range is still 0 to 232-1 but the precision is 1 in 211. I've called this format ufloat16.
Here's the routine to encode a 32-bit unsigned integer, which is another application of the clz (count leading zeros) instruction:
(defcode to-ufloat16 (n) ; Normalize ($clz 'a1 'a0) ($andi 'a1 'a1 #x1f) ($sll 'a0 'a0 'a1) ; Shift back down to bottom 11 bits ($srli 'a0 'a0 21) ; Shift result of clz to top 5 bits ($slli 'a1 'a1 11) ; Pack into 16 bits ($or 'a0 'a1 'a0) ($ret))
Here's the routine to unpack an integer in ufloat16 notation, which uses the new ror (rotate right) instruction:
(defcode from-ufloat16 (n) ; Get the exponent from top 5 bits ($srli 'a1 'a0 11) ; Get the fraction from bottom 11 bits ($li 'a2 #x7ff) ($and 'a0 'a0 'a2) ; Shift up/down by the exponent ($addi 'a1 'a1 11) ($ror 'a0 'a0 'a1) ($ret))
Here are some examples (using a Lisp format statement to print the results in hexadecimal where appropriate).
Numbers up to 2048 are encoded without loss of precision:
> (format t "#x~4,'0x" (to-ufloat16 1)) #xfc00 > (from-ufloat16 #xfc00) 1
> (format t "#x~4,'0x" (to-ufloat16 2048)) #xa400 > (from-ufloat16 #xa400) 2048
Numbers over 2048 have 11-bits precision:
> (format t "#x~4,'0x" (to-ufloat16 4661)) #x9c8d > (from-ufloat16 #x9c8d) 4660
up to the maximum unsigned 32-bit number #xffffffff:
> (format t "#x~4,'0x" (to-ufloat16 #xffffffff)) #x07ff > (format t "#x~8,'0x" (from-ufloat16 #x07ff)) #xffe00000
You could do something similar to represent signed 32-bit integers in 16 bits by using one of the bits as a sign bit.
Interleaving two integers - zip and unzip
The following example is a way to encode two small integers, such as a pair of coordinates, as a single compact integer. The encoding technique involves expressing the two numbers in binary, and then interleaving the bitstrings, right-aligned, so their bits alternate.
The new zip and unzip instructions are ideal for this application. The zip instruction interleaves the upper and lower half of a register into the odd and even bits of the result, and unzip does the reverse operation.
The function encode takes two integers of 16 bits or less, and interleaves them into a single integer:
(defcode encode (x y) ($pack 'a2 'a0 'a1) ($zip 'a0 'a2) ($ret))
For example:
> (encode 137 73) 24771
The function decode takes a single integer and decodes it into a list of the original two numbers. It uses a machine-code function unzip:
(defcode unzip (x) ($unzip 'a0 'a0) ($ret)) (defun decode (x) (let ((u (unzip x))) (list (logand u #xffff) (logand (ash u -16) #xffff))))
For example:
> (decode 24771) (137 73)
The zip instruction is also useful for making double-width characters from bitmap fonts, by doubling each column of pixels.
Binomial random number generator - cpop
The next example shows how to generate random numbers with a binomial distribution. It uses the new cpop instruction (standing for population count) which counts the number of '1' bits in a register.
For example, suppose you tossed 20 coins and counted the number of heads. If you repeated this 2^20 times you would expect to get:
- No heads 20C0 times, or once.
- 1 head 20C1 or 20 times.
- 10 heads 20C10 or 184756 times.
- 20 heads 20C20 times, or once.
This is a binomial distribution.
To get a random number from 0 to 20 with a binomial distribution you can simulate the coin tossing by generating a 20-bit random number, and then counting the number of '1' bits. The cpop instruction will do this:
(defcode popcount (n) ($cpop 'a0 'a0) ($ret))
For example:
> (popcount #b10101010101010101010) 10
The final binomial random number generator is then:
(defun binomial-random () (popcount (random #xfffff)))
Trying it out:
> (dotimes (x 20) (format t "~a " (binomial-random))) 11 9 10 10 9 10 10 8 10 7 11 13 10 15 6 9 11 13 14 13
Summary of the extensions
Here's a summary of the extensions defined in the RISC-V RP2350 extensions file:
Operation | Example | Action | Notes | |
Basic bit |
AND inverted operand Count leading zeros Count set bits Count trailing zeros Maximum Unsigned maximum Minimum Unsigned minimum Bitwise OR-combine OR inverted operand Byte-reverse register Rotate left Rotate right Rotate right immed. Sign-extend byte Sign-extend halfword Exclusive NOR Zero-extend byte Zero-extend halfword |
($andn 'a0 'a1 'a2) ($clz 'a0 'a1) ($cpop 'a0 'a1) ($ctz 'a0 'a1) ($max 'a0 'a1 'a2) ($maxu 'a0 'a1 'a2) ($min 'a0 'a1 'a2) ($minu 'a0 'a1 'a2) ($orc.b 'a0 'a1) ($orn 'a0 'a1 'a2) ($rev8 'a0 'a1) ($rol 'a0 'a1 'a2) ($ror 'a0 'a1 'a2) ($rori 'a0 'a1 11) ($sext.b 'a0 'a1) ($sext.h 'a0 'a1) ($xnor 'a0 'a1) ($zext.b 'a0 'a1) ($zext.h 'a0 'a1) |
a0 = a1 & ~a2
a0 = a1 + imm a0 = a1 - a2 a0 = max(a1, a2) a0 = max(a1, a2) a0 = min(a1, a2) a0 = min(a1, a2)
a0 = a1 | ~a2 a0 = a1 byte reversed a0 = a1 rotate left by a2 a0 = a1 rotate right by a2 a0 = a1 rotate right imm a0 = a1[7..0] sign extend a0 = a1[15..0] sign extend a0 = ~(a1 ^ a2) a0 = a1[7..0] zero extend a0 = a1[15..0] zero extend |
Number of leading zeros Number of 1s; popcount Number of trailing zeros Signed integers Unsigned integers Signed integers Unsigned integers Byte is #xff if any bit set
Only lower 5 bits of a2 Only lower 5 bits of a2 Only lower 5 bits of imm |
Single bit |
Single-bit clear Single-bit clear immed. Single bit extract Single bit extract immed. Single-bit invert Single-bit invert immed. Single-bit set Single-bit set immed. |
($bclr 'a0 'a1 'a2) ($bclri 'a0 'a1 8) ($bext 'a0 'a1 'a2) ($bexti 'a0 'a1 8) ($binv 'a0 'a1 'a2) ($binvi 'a0 'a1 8) ($binv 'a0 'a1 'a2) ($binvi 'a0 'a1 8) |
a0 = a1 & ~(1<<a2) a0 = a1 & ~(1<<imm) a0 = (a1>>a2) & 1 a0 = (a1>>imm) & 1 a0 = a1 ^ (1<<a2) a0 = a1 ^ (1<<imm) a0 = a1 | (1<<a2) a0 = a1 | (1<<imm) |
Only lower 5 bits of a2 Only lower 5 bits of imm Only lower 5 bits of a2 Only lower 5 bits of imm Only lower 5 bits of a2 Only lower 5 bits of imm Only lower 5 bits of a2 Only lower 5 bits of imm |
Cryptography |
Bit-reverse each byte Pack 2 halfwords Pack 2 bytes into halfword Deinterleave odd/even bits Interleave upper/lower half |
($brev8 'a0 'a1) ($pack 'a0 'a1 'a2) ($packh 'a0 'a1 'a2) ($unzip 'a0 'a1) |
a0 = a1 bit reversed a0 = (a2<<15) | a1 a0 = (a2<<7) | a1 |
Lower 16 bits of a1, a2 Lower 8 bits of a1, a2 |
- ^ Since the top bit of F will always be a '1' (except in the case of zero) it can be omitted, to increase the precision to 12 bits. However, one value then has to be used to represent zero, which makes the routines more complicated.