Embedding assembly into Golang

Embedding Assembly to Golang

Go is very powerful language, but even with all this power time to time we need to go lower level and write some “kosher code” that will be hyper optimized either for memory usage of performance.

In Golang we do have two interfaces to do so, depending on the level we want to write our hyper-performant code:

  • CGO if we want to write code in C/C++ or other programming language that can generate shared libraries (on Windows .DLL, for Linux and Unix .so).
  • Internal Golang assembly interface.

In this example I want to show how to write code in assembly, so lowest possible level with Golang.

Pros & cons

Pros

  • Performant code, that is heavy dependent on hardware used. It utilizes CPU registers to perform operation on the lowest level – hardware itself.

Cons

  • Assembly code is architecture dependent, but also concrete hardware specific. In our example we’re using RDRAND assembly function, that uses Intel/AMD chip hardware random generator. This code had not been available before IvyBridge chips were introduced. This is chip feature and should be used carefuly.
    This is the issue of every assembly instruction, except of course basic ones that are shared.
  • Assembly code uses unsafe pointers, what disables completely Golang safety features. If you don’t know what you’re doing, you should avoid unsafe package.

The Code

We will generate code for two architectures – X86 (modern AMD and Intel CPUs) to utilize RDRAND function (actually RDRANDL), and ARM64 (actually ARM8.5-A specific – Apple Silicon) to utilize RNDR function that implements similar feature on Apple M-series chips.

X86 version utilizing RDRANDL

// +build amd64,!appengine,!gccgo

#include "textflag.h"

TEXT ·GenerateRandomBytesASM(SB), NOSPLIT, $0
	MOVQ b+0(FP), DI		// Address of slice to DI register
	MOVQ b+8(FP), SI		// Length of slice to SI register

	TESTQ SI, SI  			//check if the length is zero
	JLE zero_or_negative            //jump on zero

generate_loop:
	RDRANDL AX 			// CPU needs to support RANDR
					// instruction - all above IvyBridge
					// on Intel, all AMD

	JC store_byte			// Make jump if carry flag is set 
					// (successfully executed)

	DECQ SI				// Decrement counter in SI
	JNZ generate_loop 		// Jump back and generate another byte
	RET

store_byte:
	MOVB AX, (DI)			// Store the random byte generated
					// in generate_loop
	INCQ DI				// Increment pointer
	DECQ SI				// Decrement length counter
	JNZ generate_loop 		// Continue loop for more random bytes

zero_or_negative:
	MOVL $0, AX			// return 0 value to AX register
	RET

Please note special character in assembly function ·GenerateRandomBytesASM(SB), NOSPLIT, $0. This function is prefixed with ·character what corresponds to UTF-8 [c2 b7] char.

Code had been prefixed with build selector, so code will build only for AMD64 (x86_64 architecture) and should not be build for Google AppEngine or with GCC GO compiler.

// +build amd64,!appengine,!gccgo

File had been stored in randomgenerator_amd64.s in opposition to .go files, .s are containing assembly code for golang compiler to build.

ARM8.5-A version utilizing RNDR

// +build arm64,!appengine,!gccgo

#include "textflag.h"

TEXT ·GenerateRandomBytesASM(SB), NOSPLIT, $0-0
	MOV X0, b+0(FP)			// slice address to X0 register
	MOV X1, b+8(FP)			// lenght of slice to X1 register

	CBZ X1, zero_or_negative 	// check if X1 content is zero (finish)

generate_loop:
	RNDR X2				// Generate a randon number
					// in X2 registry

	CBNZ X2, store_byte 		// check if generation is successful
 					// (non zero) and jump to store_byte

	SUB X1, X1, #1 			// substract (decrease) length counter
	CBZ X1, generate_loop		// check if length is not zero 
					// and continue loop if so
	RET

store_byte:
	STRB W2, [X0], #1		// Store lower byte of X0 into 
					// the buffer, post increment X0
					// (single instruction)

	SUB X1, X1, #1			// subtract (decrease) lenght counter
	CBNZ X1, generate_loop 		// If not-zero, continue loop
	RET

zero_or_negative:
	MOV W0, #0			// Return 0
	RET

In this example we’re also prefixing assembly text entry naming Golang function with UTF-8 char [c2 b7].

Code had been prefixed with build selector, so code will build only for ARM64 (ARM 64-bit architecture) and should not be build for Google AppEngine or with GCC GO compiler.

// +build amd64,!appengine,!gccgo

File had been stored in randomgenerator_arm64.s in opposition to .go files, .s are containing assembly code for golang compiler to build.

The interfacing

Having both files files stored we need to create interface for Golang to use. We need to create some .go file to bind with this ASM function. In my case I called it randomgenerator.go:

//go:build amd64 || arm64
// +build amd64 arm64          // obsolete, above instruction is enough

package goasmblogpost

// GenerateRandomBytesASM is no body function same as in assembly above
// to cover assembly implementation
func GenerateRandomBytesASM(b *byte, n int) int   

Now having proper interfacing in place we can test our functionality writing some unit tests.

This functionality will only work on ARM64 and AMD64 architectures. With proper build macros as shown above you can also implement slower pure-go functionality for other platforms using following macro instruction:

//go:build !amd64 && !arm64
// +build !amd64 !arm64

package goasmblogpost

func GenerateRandomBytesASM(b *byte, n int) int {
...your pure go code for other architectures...
}

Of course there should be better naming convention in this case, because it will not be an ASM code anymore, but should match with our ASM interfacing code.

Testing

To test our implementation we can write simple test:

package goasmblogpost

import (
	"testing"
	"unsafe"
)

const randomBytesToGenerate = 1000

func TestGenerateRandomBytesASM(t *testing.T) {
	targetSlice := make([]byte, randomBytesToGenerate)
	result := GenerateRandomBytesASM(
		(*byte)(unsafe.Pointer(&targetSlice[0])), randomBytesToGenerate)
	
	if result <= 0 {
		t.Errorf("GenerateRandomBytesASM failed!")
		t.Failed()
	}
	t.Log(targetSlice)
}

This code will check if GenerateRandomBytesASM will execute correctly (result > 0), otherwise it failed.

The result of running test above is:

=== RUN   TestGenerateRandomBytesASM
    generatebytes_test.go:17: [5 109 151 126 167 235 52 122 110 40 255 16 163 122 111 71 254 200 193 100 120 103 196 146 203 190 3 0 8 209 52 143 142 237 223 
<cut for readability>
 202 64 91 132 202 34 17 176 88 24 245 25 24 186 197 118 59 109 102 73 107 134 200 182 172 99 148]
--- PASS: TestGenerateRandomBytesASM (0.00s)
PASS

Summary

This example demonstrates a basic use of assembly within Go. Writing assembly code is typically reserved for addressing complex performance issues where the benefits outweigh the risks, such as breaking Go’s safety checks.

This post is more showing how to do some stuff, but does not justify using unsafe package – still fun to play :).