RISC-V assembler RP2350 extensions

21st October 2024

This article describes functions to add support to the Lisp RISC-V assembler for additional RISC-V instructions provided in the RP2350.

Introduction

The RP2350 Hazard3 RISC-V core designed by Luke Wren extends the base 32-bit RISC-V instruction set with a number of RISC-V extensions. To my mind the most interesting of these to uLisp users are the Zbb, Zbs, and Zbkb extensions which provide bit manipulations and single bit instructions which could be particularly useful in embedded and electronics applications. I've defined an additional RISC-V extensions file to allow you to add support for these to the RISC-V assembler.

Loading the RISC-V extensions

To add the extensions load the standard assembler file first, followed by the extensions file, because some of the extensions add compressed versions of the instructions in the main file.

Get the standard assembler file here: RISC-V assembler in uLisp.

Get the extensions here: RISC-V RP2350 extensions.

Note that these extensions won't work on the Kendryte K210 RISC-V processor used on the Sipeed MAiX boards, which is also supported by the RISC-V assembler in uLisp.

Examples

It's not obvious what some of these extensions might be useful for, so the following examples demonstrate some possible applications:

Reverse bits – brev8 and rev8

This is a function to efficiently reverse the order of bits in a 32-bit number. The reverse-bits operation could be useful when transforming bitmap images, or when interfacing between protocols that work MSB first and LSB first. It takes advantage of the brev8 instruction that reverses the bits within each byte, and the rev8 instruction that reverses the order of the bytes:

(defcode reverse-bits (n)
  ; Reverse bits within each byte
  ($brev8 'a0 'a0)
  ; Reverse all bytes
  ($rev8 'a0 'a0)
  ($ret))

For example:

> (format t "~b" (reverse-bits #b10110011100011110000111110000011))
11000001111100001111000111001101

Maximum number in a list - max

The following example demonstrates the use of the max instruction that returns the maximum of two signed integers. It finds the largest integer in a list of arbitrary length:

(defcode maximum (x)
  ($lui 'a2 #x80000)
  repeat
  ($beqz 'a0 finished)
  ($lw 'a1 0 '(a0))
  ($lw 'a1 4 '(a1))
  ($max 'a2 'a1 'a2)
  ($lw 'a0 4 '(a0))
  ($j repeat)
  finished
  ($mv 'a0 'a2)
  ($ret))

For example:

> (maximum '(23 -91 47 -73 11))
47

It iterates through the list keeping track of the largest value found so far. Obviously you can also use min to find the smallest value.

Integer square root - clz

The new clz instruction counts the number of leading zeros in a register. It provides an easy way of getting upper and lower bounds for the integer part of the square root of a number. These are useful for applications such as finding prime numbers, where the upper bound gives the largest factor you need to test. If a more accurate result is needed, these bounds can be used as the starting point for Newton's method, or a binary search.

The algorithm takes advantage of the fact that the length of the binary representation of a number's integer square root is approximately half that of the original number.

Here is the upper bound routine, upper-sqrt:

(defcode upper-sqrt (x)
  ($li 'a1 33)
  ($li 'a2 1)
  ($clz 'a0 'a0)
  ($sub 'a0 'a1 'a0)
  ($srli 'a0 'a0 1)
  ($sll 'a0 'a2 'a0)
  ($addi 'a0 'a0 -1)
  ($ret))

It's equivalent to this Lisp function (assuming you defined clz):

(defun upper-sqrt (x) (1- (ash 1 (truncate (- 33 (clz x)) 2))))

For example:

> (upper-sqrt 9)
3
> (upper-sqrt 1000000)
1023
> (upper-sqrt 1600000000)
65535

To get the lower bound of the integer square root you could use the following Lisp function, lower-sqrt:

(defun lower-sqrt (x) (1- (truncate (+ (upper-sqrt x) 3) 2)))

A compact representation for unsigned integers - clz and ror

A 32-bit unsigned integer has a range of 0 to 232-1 and precision of 1 in 232. Is it possible to devise a more compact 16-bit floating-point format that will represent the same range, but with reduced precision? This might be useful, for example, to log the values from an analogue-to-digital converter with limited storage.

The solution is to normalize the 32-bit unsigned integer, by shifting it left until the most significant bit is a '1'. Then store the number in a 16-bit halfword with the top five bits (E, the exponent) giving the amount of the shift, and the bottom 11 bits (F, the fractional part) giving the top 11 bits of the normalized number [1].

A number N is then: N = (F + #x800) × 2(11-E).

The range is still 0 to 232-1 but the precision is 1 in 211. I've called this format ufloat16.

Here's the routine to encode a 32-bit unsigned integer, which is another application of the clz (count leading zeros) instruction:

(defcode to-ufloat16 (n)
   ; Normalize
   ($clz 'a1 'a0)
   ($andi 'a1 'a1 #x1f)
   ($sll 'a0 'a0 'a1)
   ; Shift back down to bottom 11 bits
   ($srli 'a0 'a0 21)
   ; Shift result of clz to top 5 bits
   ($slli 'a1 'a1 11)
   ; Pack into 16 bits
   ($or 'a0 'a1 'a0)
   ($ret))

Here's the routine to unpack an integer in ufloat16 notation, which uses the new ror (rotate right) instruction:

(defcode from-ufloat16 (n)
   ; Get the exponent from top 5 bits
   ($srli 'a1 'a0 11)
   ; Get the fraction from bottom 11 bits
   ($li 'a2 #x7ff)
   ($and 'a0 'a0 'a2)
   ; Shift up/down by the exponent
   ($addi 'a1 'a1 11)
   ($ror 'a0 'a0 'a1)
   ($ret))

Here are some examples (using a Lisp format statement to print the results in hexadecimal where appropriate).

Numbers up to 2048 are encoded without loss of precision:

> (format t "#x~4,'0x" (to-ufloat16 1))
#xfc00

> (from-ufloat16 #xfc00)
1
> (format t "#x~4,'0x" (to-ufloat16 2048))
#xa400

> (from-ufloat16 #xa400)
2048

Numbers over 2048 have 11-bits precision:

> (format t "#x~4,'0x" (to-ufloat16 4661))
#x9c8d

> (from-ufloat16 #x9c8d)
4660

up to the maximum unsigned 32-bit number #xffffffff:

> (format t "#x~4,'0x" (to-ufloat16 #xffffffff))
#x07ff

> (format t "#x~8,'0x" (from-ufloat16 #x07ff))
#xffe00000

You could do something similar to represent signed 32-bit integers in 16 bits by using one of the bits as a sign bit.

Interleaving two integers - zip and unzip

The following example is a way to encode two small integers, such as a pair of coordinates, as a single compact integer. The encoding technique involves expressing the two numbers in binary, and then interleaving the bitstrings, right-aligned, so their bits alternate.

The new zip and unzip instructions are ideal for this application. The zip instruction interleaves the upper and lower half of a register into the odd and even bits of the result, and unzip does the reverse operation.

The function encode takes two integers of 16 bits or less, and interleaves them into a single integer:

(defcode encode (x y)
  ($pack 'a2 'a0 'a1)
  ($zip 'a0 'a2)
  ($ret))

For example:

> (encode 137 73)
24771

The function decode takes a single integer and decodes it into a list of the original two numbers. It uses a machine-code function unzip:

(defcode unzip (x)
  ($unzip 'a0 'a0)
  ($ret))

(defun decode (x)
  (let ((u (unzip x)))
    (list (logand u #xffff) (logand (ash u -16) #xffff)))) 

For example:

> (decode 24771)
(137 73)

The zip instruction is also useful for making double-width characters from bitmap fonts, by doubling each column of pixels.

Binomial random number generator - cpop

The next example shows how to generate random numbers with a binomial distribution. It uses the new cpop instruction (standing for population count) which counts the number of '1' bits in a register.

For example, suppose you tossed 20 coins and counted the number of heads. If you repeated this 2^20 times you would expect to get:

  • No heads 20C0 times, or once.
  • 1 head 20C1 or 20 times.
  • 10 heads 20C10 or 184756 times.
  • 20 heads 20C20 times, or once.

This is a binomial distribution.

To get a random number from 0 to 20 with a binomial distribution you can simulate the coin tossing by generating a 20-bit random number, and then counting the number of '1' bits. The cpop instruction will do this:

(defcode popcount (n)
  ($cpop 'a0 'a0)
  ($ret))

For example:

> (popcount #b10101010101010101010)
10

The final binomial random number generator is then:

(defun binomial-random ()
  (popcount (random #xfffff)))

Trying it out:

> (dotimes (x 20) (format t "~a " (binomial-random)))
11 9 10 10 9 10 10 8 10 7 11 13 10 15 6 9 11 13 14 13 

Summary of the extensions

Here's a summary of the extensions defined in the RISC-V RP2350 extensions file:

  Operation Example Action Notes

Basic bit
manipulation

AND inverted operand

Count leading zeros

Count set bits

Count trailing zeros

Maximum

Unsigned maximum

Minimum

Unsigned minimum

Bitwise OR-combine

OR inverted operand

Byte-reverse register

Rotate left

Rotate right

Rotate right immed.

Sign-extend byte

Sign-extend halfword

Exclusive NOR

Zero-extend byte

Zero-extend halfword

($andn 'a0 'a1 'a2)

($clz 'a0 'a1)

($cpop 'a0 'a1)

($ctz 'a0 'a1)

($max 'a0 'a1 'a2)

($maxu 'a0 'a1 'a2)

($min 'a0 'a1 'a2)

($minu 'a0 'a1 'a2)

($orc.b 'a0 'a1)

($orn 'a0 'a1 'a2)

($rev8 'a0 'a1)

($rol 'a0 'a1 'a2)

($ror 'a0 'a1 'a2)

($rori 'a0 'a1 11)

($sext.b 'a0 'a1)

($sext.h 'a0 'a1)

($xnor 'a0 'a1)

($zext.b 'a0 'a1)

($zext.h 'a0 'a1)

a0 = a1 & ~a2

 

a0 = a1 + imm

a0 = a1 - a2

a0 = max(a1, a2)

a0 = max(a1, a2)

a0 = min(a1, a2)

a0 = min(a1, a2)

 

a0 = a1 | ~a2

a0 = a1 byte reversed

a0 = a1 rotate left by a2

a0 = a1 rotate right by a2

a0 = a1 rotate right imm

a0 = a1[7..0] sign extend

a0 = a1[15..0] sign extend

a0 = ~(a1 ^ a2)

a0 = a1[7..0] zero extend

a0 = a1[15..0] zero extend

 

Number of leading zeros

Number of 1s; popcount

Number of trailing zeros

Signed integers

Unsigned integers

Signed integers

Unsigned integers

Byte is #xff if any bit set

 

 

Only lower 5 bits of a2

Only lower 5 bits of a2

Only lower 5 bits of imm

Single bit
instructions 

Single-bit clear

Single-bit clear immed.

Single bit extract

Single bit extract immed.

Single-bit invert

Single-bit invert immed.

Single-bit set

Single-bit set immed.

($bclr 'a0 'a1 'a2)

($bclri 'a0 'a1 8)

($bext 'a0 'a1 'a2)

($bexti 'a0 'a1 8)

($binv 'a0 'a1 'a2)

($binvi 'a0 'a1 8)

($binv 'a0 'a1 'a2)

($binvi 'a0 'a1 8)

a0 = a1 & ~(1<<a2)

a0 = a1 & ~(1<<imm)

a0 = (a1>>a2) & 1

a0 = (a1>>imm) & 1

a0 = a1 ^ (1<<a2)

a0 = a1 ^ (1<<imm)

a0 = a1 | (1<<a2)

a0 = a1 | (1<<imm)

Only lower 5 bits of a2

Only lower 5 bits of imm

Only lower 5 bits of a2

Only lower 5 bits of imm

Only lower 5 bits of a2

Only lower 5 bits of imm

Only lower 5 bits of a2

Only lower 5 bits of imm

Cryptography 

Bit-reverse each byte

Pack 2 halfwords

Pack 2 bytes into halfword

Deinterleave odd/even bits

Interleave upper/lower half

($brev8 'a0 'a1)

($pack 'a0 'a1 'a2)

($packh 'a0 'a1 'a2)

($unzip 'a0 'a1)
($zip 'a0 'a1)

a0 = a1 bit reversed

a0 = (a2<<15) | a1

a0 = (a2<<7) | a1

 

Lower 16 bits of a1, a2

Lower 8 bits of a1, a2


  1. ^ Since the top bit of F will always be a '1' (except in the case of zero) it can be omitted, to increase the precision to 12 bits. However, one value then has to be used to represent zero, which makes the routines more complicated.