The embedonomicon

The embedonomicon walks you through the process of creating a #![no_std] application from scratch and through the iterative process of building architecture-specific functionality for Cortex-M microcontrollers.

Objectives

By reading this book you will learn

  • How to build a #![no_std] application. This is much more complex than building a #![no_std] library because the target system may not be running an OS (or you could be aiming to build an OS!) and the program could be the only process running in the target (or the first one). In that case, the program may need to be customized for the target system.

  • Tricks to finely control the memory layout of a Rust program. You'll learn about linkers, linker scripts and about the Rust features that let you control a bit of the ABI of Rust programs.

  • A trick to implement default functionality that can be statically overridden (no runtime cost).

Target audience

This book mainly targets to two audiences:

  • People that wish to bootstrap bare metal support for an architecture that the ecosystem doesn't yet support (e.g. Cortex-R as of Rust 1.28), or for an architecture that Rust just gained support for (e.g. maybe Xtensa some time in the future).

  • People that are curious about the unusual implementation of runtime crates like cortex-m-rt, msp430-rt and riscv-rt.

Requirements

This book is self contained. The reader doesn't need to be familiar with the Cortex-M architecture, nor is access to a Cortex-M microcontroller needed -- all the examples included in this book can be tested in QEMU. You will, however, need to install the following tools to run and inspect the examples in this book:

  • All the code in this book uses the 2018 edition. If you are not familiar with the 2018 features and idioms check the edition guide.

  • Rust 1.31 or a newer toolchain PLUS ARM Cortex-M compilation support.

  • cargo-binutils. v0.1.4 or newer.

  • cargo-edit.

  • QEMU with support for ARM emulation. The qemu-system-arm program must be installed on your computer.

  • GDB with ARM support.

Example setup

Instructions common to all OSes

$ # Rust toolchain
$ # If you start from scratch, get rustup from https://rustup.rs/
$ rustup default stable

$ # toolchain should be newer than this one
$ rustc -V
rustc 1.31.0 (abe02cefd 2018-12-04)

$ rustup target add thumbv7m-none-eabi

$ # cargo-binutils
$ cargo install cargo-binutils

$ rustup component add llvm-tools-preview

macOS

$ # arm-none-eabi-gdb
$ # you may need to run `brew tap Caskroom/tap` first
$ brew cask install gcc-arm-embedded

$ # QEMU
$ brew install qemu

Ubuntu 16.04

$ # arm-none-eabi-gdb
$ sudo apt install gdb-arm-none-eabi

$ # QEMU
$ sudo apt install qemu-system-arm

Ubuntu 18.04 or Debian

$ # gdb-multiarch -- use `gdb-multiarch` when you wish to invoke gdb
$ sudo apt install gdb-multiarch

$ # QEMU
$ sudo apt install qemu-system-arm

Windows

Installing a toolchain bundle from ARM (optional step) (tested on Ubuntu 18.04)

  • With the late 2018 switch from GCC's linker to LLD for Cortex-M microcontrollers, gcc-arm-none-eabi is no longer required. But for those wishing to use the toolchain anyway, install from here and follow the steps outlined below:
$ tar xvjf gcc-arm-none-eabi-8-2018-q4-major-linux.tar.bz2
$ mv gcc-arm-none-eabi-<version_downloaded> <your_desired_path> # optional
$ export PATH=${PATH}:<path_to_arm_none_eabi_folder>/bin # add this line to .bashrc to make persistent

The smallest #![no_std] program

In this section we'll write the smallest #![no_std] program that compiles.

What does #![no_std] mean?

#![no_std] is a crate level attribute that indicates that the crate will link to the core crate instead of the std crate, but what does this mean for applications?

The std crate is Rust's standard library. It contains functionality that assumes that the program will run on an operating system rather than directly on the metal. std also assumes that the operating system is a general purpose operating system, like the ones one would find in servers and desktops. For this reason, std provides a standard API over functionality one usually finds in such operating systems: Threads, files, sockets, a filesystem, processes, etc.

On the other hand, the core crate is a subset of the std crate that makes zero assumptions about the system the program will run on. As such, it provides APIs for language primitives like floats, strings and slices, as well as APIs that expose processor features like atomic operations and SIMD instructions. However it lacks APIs for anything that involves heap memory allocations and I/O.

For an application, std does more than just providing a way to access OS abstractions. std also takes care of, among other things, setting up stack overflow protection, processing command line arguments and spawning the main thread before a program's main function is invoked. A #![no_std] application lacks all that standard runtime, so it must initialize its own runtime, if any is required.

Because of these properties, a #![no_std] application can be the first and / or the only code that runs on a system. It can be many things that a standard Rust application can never be, for example:

  • The kernel of an OS.
  • Firmware.
  • A bootloader.

The code

With that out of the way, we can move on to the smallest #![no_std] program that compiles:

$ cargo new --edition 2018 --bin app

$ cd app
$ # modify main.rs so it has these contents
$ cat src/main.rs

# #![allow(unused_variables)]
#![no_main]
#![no_std]

#fn main() {
use core::panic::PanicInfo;

#[panic_handler]
fn panic(_panic: &PanicInfo<'_>) -> ! {
    loop {}
}

#}

This program contains some things that you won't see in standard Rust programs:

The #![no_std] attribute which we have already extensively covered.

The #![no_main] attribute which means that the program won't use the standard main function as its entry point. At the time of writing, Rust's main interface makes some assumptions about the environment the program executes in: For example, it assumes the existence of command line arguments, so in general, it's not appropriate for #![no_std] programs.

The #[panic_handler] attribute. The function marked with this attribute defines the behavior of panics, both library level panics (core::panic!) and language level panics (out of bounds indexing).

This program doesn't produce anything useful. In fact, it will produce an empty binary.

$ # equivalent to `size target/thumbv7m-none-eabi/debug/app`
$ cargo size --target thumbv7m-none-eabi --bin app
   text	   data	    bss	    dec	    hex	filename
      0	      0	      0	      0	      0	app

Before linking the crate does contain the panicking symbol.

$ cargo rustc --target thumbv7m-none-eabi -- --emit=obj

$ cargo nm -- target/thumbv7m-none-eabi/debug/deps/app-*.o | grep '[0-9]* [^n] '
00000000 T rust_begin_unwind

However, it's our starting point. In the next section, we'll build something useful. But before continuing, let's set a default build target to avoid having to pass the --target flag to every Cargo invocation.

$ mkdir .cargo

$ # modify .cargo/config so it has these contents
$ cat .cargo/config
[build]
target = "thumbv7m-none-eabi"

Memory layout

The next step is to ensure the program has the right memory layout so that the target system will be able to execute it. In our example, we'll be working with a virtual Cortex-M3 microcontroller: the LM3S6965. Our program will be the only process running on the device so it must also take care of initializing the device.

Background information

Cortex-M devices require a vector table to be present at the start of their code memory region. The vector table is an array of pointers; the first two pointers are required to boot the device; the rest of pointers are related to exceptions -- we'll ignore them for now.

Linkers decide the final memory layout of programs, but we can use linker scripts to have some control over it. The control granularity that linker scripts give us over the layout is at the level of sections. A section is a collection of symbols laid out in contiguous memory. Symbols, in turn, can be data (a static variable), or instructions (a Rust function).

Every symbol has a name assigned by the compiler. As of Rust 1.28 , the Rust compiler assigns to symbols names of the form: _ZN5krate6module8function17he1dfc17c86fe16daE, which demangles to krate::module::function::he1dfc17c86fe16da where krate::module::function is the path of the function or variable and he1dfc17c86fe16da is some sort of hash. The Rust compiler will place each symbol into its own and unique section; for example the symbol mentioned before will be placed in a section named .text._ZN5krate6module8function17he1dfc17c86fe16daE.

These compiler generated symbol and section names are not guaranteed to remain constant across different releases of the Rust compiler. However, the language lets us control symbol names and section placement via these attributes:

  • #[export_name = "foo"] sets the symbol name to foo.
  • #[no_mangle] means: use the function or variable name (not its full path) as its symbol name. #[no_mangle] fn bar() will produce a symbol named bar.
  • #[link_section = ".bar"] places the symbol in a section named .bar.

With these attributes, we can expose a stable ABI of the program and use it in the linker script.

The Rust side

Like mentioned before, for Cortex-M devices, we need to populate the first two entries of the vector table. The first one, the initial value for the stack pointer, can be populated using only the linker script. The second one, the reset vector, needs to be created in Rust code and placed correctly using the linker script.

The reset vector is a pointer into the reset handler. The reset handler is the function that the device will execute after a system reset, or after it powers up for the first time. The reset handler is always the first stack frame in the hardware call stack; returning from it is undefined behavior as there's no other stack frame to return to. We can enforce that the reset handler never returns by making it a divergent function, which is a function with signature fn(/* .. */) -> !.


# #![allow(unused_variables)]
#fn main() {
#[no_mangle]
pub unsafe extern "C" fn Reset() -> ! {
    let _x = 42;

    // can't return so we go into an infinite loop here
    loop {}
}

// The reset vector, a pointer into the reset handler
#[link_section = ".vector_table.reset_vector"]
#[no_mangle]
pub static RESET_VECTOR: unsafe extern "C" fn() -> ! = Reset;

#}

The hardware expects a certain format here, to which we adhere by using extern "C" to tell the compiler to lower the function using the C ABI, instead of the Rust ABI, which is unstable.

To refer to the reset handler and reset vector from the linker script, we need them to have a stable symbol name so we use #[no_mangle]. We need fine control over the location of RESET_VECTOR, so we place it in a known section, .vector_table.reset_vector. The exact location of the reset handler itself, Reset, is not important. We just stick to the default compiler generated section.

Also, the linker will ignore symbols with internal linkage, AKA internal symbols, while traversing the list of input object files, so we need our two symbols to have external linkage. The only way to make a symbol external in Rust is to make its corresponding item public (pub) and reachable (no private module between the item and the root of the crate).

The linker script side

Below is shown a minimal linker script that places the vector table in the right location. Let's walk through it.

$ cat link.x
/* Memory layout of the LM3S6965 microcontroller */
/* 1K = 1 KiBi = 1024 bytes */
MEMORY
{
  FLASH : ORIGIN = 0x00000000, LENGTH = 256K
  RAM : ORIGIN = 0x20000000, LENGTH = 64K
}

/* The entry point is the reset handler */
ENTRY(Reset);

EXTERN(RESET_VECTOR);

SECTIONS
{
  .vector_table ORIGIN(FLASH) :
  {
    /* First entry: initial Stack Pointer value */
    LONG(ORIGIN(RAM) + LENGTH(RAM));

    /* Second entry: reset vector */
    KEEP(*(.vector_table.reset_vector));
  } > FLASH

  .text :
  {
    *(.text .text.*);
  } > FLASH

  /DISCARD/ :
  {
    *(.ARM.exidx.*);
  }
}

MEMORY

This section of the linker script describes the location and size of blocks of memory in the target. Two memory blocks are defined: FLASH and RAM; they correspond to the physical memory available in the target. The values used here correspond to the LM3S6965 microcontroller.

ENTRY

Here we indicate to the linker that the reset handler -- whose symbol name is Reset -- is the entry point of the program. Linkers aggressively discard unused sections. Linkers consider the entry point and functions called from it as used so they won't discard them. Without this line, the linker would discard the Reset function and all subsequent functions called from it.

EXTERN

Linkers are lazy; they will stop looking into the input object files once they have found all the symbols that are recursively referenced from the entry point. EXTERN forces the linker to look for EXTERN's argument even after all other referenced symbols have been found. As a rule of thumb, if you need a symbol that's not called from the entry point to always be present in the output binary, you should use EXTERN in conjunction with KEEP.

SECTIONS

This part describes how sections in the input object files, AKA input sections, are to be arranged in the sections of the output object file, AKA output sections; or if they should be discarded. Here we define two output sections:

  .vector_table ORIGIN(FLASH) : { /* .. */ } > FLASH

.vector_table, which contains the vector table and is located at the start of FLASH memory,

  .text : { /* .. */ } > FLASH

and .text, which contains the program subroutines and is located somewhere in FLASH. Its start address is not specified, but the linker will place it after the previous output section, .vector_table.

The output .vector_table section contains:

    /* First entry: initial Stack Pointer value */
    LONG(ORIGIN(RAM) + LENGTH(RAM));

We'll place the (call) stack at the end of RAM (the stack is full descending; it grows towards smaller addresses) so the end address of RAM will be used as the initial Stack Pointer (SP) value. That address is computed in the linker script itself using the information we entered for the RAM memory block.

    /* Second entry: reset vector */
    KEEP(*(.vector_table.reset_vector));

Next, we use KEEP to force the linker to insert all input sections named .vector_table.reset_vector right after the initial SP value. The only symbol located in that section is RESET_VECTOR, so this will effectively place RESET_VECTOR second in the vector table.

The output .text section contains:

    *(.text .text.*);

This includes all the input sections named .text and .text.*. Note that we don't use KEEP here to let the linker discard unused sections.

Finally, we use the special /DISCARD/ section to discard

    *(.ARM.exidx.*);

input sections named .ARM.exidx.*. These sections are related to exception handling but we are not doing stack unwinding on panics and they take up space in Flash memory, so we just discard them.

Putting it all together

Now we can link the application. For reference, here's the complete Rust program:


# #![allow(unused_variables)]
#![no_main]
#![no_std]

#fn main() {
use core::panic::PanicInfo;

// The reset handler
#[no_mangle]
pub unsafe extern "C" fn Reset() -> ! {
    let _x = 42;

    // can't return so we go into an infinite loop here
    loop {}
}

// The reset vector, a pointer into the reset handler
#[link_section = ".vector_table.reset_vector"]
#[no_mangle]
pub static RESET_VECTOR: unsafe extern "C" fn() -> ! = Reset;

#[panic_handler]
fn panic(_panic: &PanicInfo<'_>) -> ! {
    loop {}
}

#}

We have to tweak linker process to make it use our linker script. This is done passing the -C link-arg flag to rustc but there are two ways to do it: you can use the cargo-rustc subcommand instead of cargo-build as shown below:

IMPORTANT: Make sure you have the .cargo/config file that was added at the end of the last section before running this command.

$ cargo rustc -- -C link-arg=-Tlink.x

Or you can set the rustflags in .cargo/config and continue using the cargo-build subcommand. We'll do the latter because it better integrates with cargo-binutils.

# modify .cargo/config so it has these contents
$ cat .cargo/config
[target.thumbv7m-none-eabi]
rustflags = ["-C", "link-arg=-Tlink.x"]

[build]
target = "thumbv7m-none-eabi"

The [target.thumbv7m-none-eabi] part says that these flags will only be used when cross compiling to that target.

Inspecting it

Now let's inspect the output binary to confirm the memory layout looks the way we want:

$ cargo objdump --bin app -- -d -no-show-raw-insn

app:	file format ELF32-arm-little

Disassembly of section .text:
Reset:
       8:	sub	sp, #4
       a:	movs	r0, #42
       c:	str	r0, [sp]
       e:	b	#-2 <Reset+0x8>
      10:	b	#-4 <Reset+0x8>

This is the disassembly of the .text section. We see that the reset handler, named Reset, is located at address 0x8.

$ cargo objdump --bin app -- -s -section .vector_table

app:	file format ELF32-arm-little

Contents of section .vector_table:
 0000 00000120 09000000                    ... ....

This shows the contents of the .vector_table section. We can see that the section starts at address 0x0 and that the first word of the section is 0x2001_0000 (the objdump output is in little endian format). This is the initial SP value and matches the end address of RAM. The second word is 0x9; this is the thumb mode address of the reset handler. When a function is to be executed in thumb mode the first bit of its address is set to 1.

Testing it

This program is a valid LM3S6965 program; we can execute it in a virtual microcontroller (QEMU) to test it out.

$ # this program will block
$ qemu-system-arm \
      -cpu cortex-m3 \
      -machine lm3s6965evb \
      -gdb tcp::3333 \
      -S \
      -nographic \
      -kernel target/thumbv7m-none-eabi/debug/app
$ # on a different terminal
$ arm-none-eabi-gdb -q target/thumbv7m-none-eabi/debug/app
Reading symbols from target/thumbv7m-none-eabi/debug/app...done.

(gdb) target remote :3333
Remote debugging using :3333
Reset () at src/main.rs:8
8       pub unsafe extern "C" fn Reset() -> ! {

(gdb) # the SP has the initial value we programmed in the vector table
(gdb) print/x $sp
$1 = 0x20010000

(gdb) step
9           let _x = 42;

(gdb) step
12          loop {}

(gdb) # next we inspect the stack variable `_x`
(gdb) print _x
$2 = 42

(gdb) print &_x
$3 = (i32 *) 0x2000fffc

(gdb) quit

A main interface

We have a minimal working program now, but we need to package it in a way that the end user can build safe programs on top of it. In this section, we'll implement a main interface like the one standard Rust programs use.

First, we'll convert our binary crate into a library crate:

$ mv src/main.rs src/lib.rs

And then rename it to rt which stands for "runtime".

$ sed -i s/app/rt/ Cargo.toml

$ head -n4 Cargo.toml
[package]
edition = "2018"
name = "rt" # <-
version = "0.1.0"

The first change is to have the reset handler call an external main function:

$ head -n13 src/lib.rs
#![no_std]

use core::panic::PanicInfo;

// CHANGED!
#[no_mangle]
pub unsafe extern "C" fn Reset() -> ! {
    extern "Rust" {
        fn main() -> !;
    }

    main()
}

We also drop the #![no_main] attribute has it has no effect on library crates.

There's an orthogonal question that arises at this stage: Should the rt library provide a standard panicking behavior, or should it not provide a #[panic_handler] function and leave the end user choose the panicking behavior? This document won't delve into that question and for simplicity will leave the dummy #[panic_handler] function in the rt crate. However, we wanted to inform the reader that there are other options.

The second change involves providing the linker script we wrote before to the application crate. You see the linker will search for linker scripts in the library search path (-L) and in the directory from which it's invoked. The application crate shouldn't need to carry around a copy of link.x so we'll have the rt crate put the linker script in the library search path using a build script.

$ # create a build.rs file in the root of `rt` with these contents
$ cat build.rs
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};

fn main() -> Result<(), Box<Error>> {
    // build directory for this crate
    let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());

    // extend the library search path
    println!("cargo:rustc-link-search={}", out_dir.display());

    // put `link.x` in the build directory
    File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?;

    Ok(())
}

Now the user can write an application that exposes the main symbol and link it to the rt crate. The rt will take care of giving the program the right memory layout.

$ cd ..

$ cargo new --edition 2018 --bin app

$ cd app

$ # modify Cargo.toml to include the `rt` crate as a dependency
$ tail -n2 Cargo.toml
[dependencies]
rt = { path = "../rt" }
$ # copy over the config file that sets a default target and tweaks the linker invocation
$ cp -r ../rt/.cargo .

$ # change the contents of `main.rs` to
$ cat src/main.rs
#![no_std]
#![no_main]

extern crate rt;

#[no_mangle]
pub fn main() -> ! {
    let _x = 42;

    loop {}
}

The disassembly will be similar but will now include the user main function.

$ cargo objdump --bin app -- -d -no-show-raw-insn

app:	file format ELF32-arm-little

Disassembly of section .text:
main:
       8:	sub	sp, #4
       a:	movs	r0, #42
       c:	str	r0, [sp]
       e:	b	#-2 <main+0x8>
      10:	b	#-4 <main+0x8>

Reset:
      12:	bl	#-14
      16:	trap

Making it type safe

The main interface works, but it's easy to get it wrong: For example, the user could write main as a non-divergent function, and they would get no compile time error and undefined behavior (the compiler will misoptimize the program).

We can add type safety by exposing a macro to the user instead of the symbol interface. In the rt crate, we can write this macro:

$ tail -n12 ../rt/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! entry {
    ($path:path) => {
        #[export_name = "main"]
        pub unsafe fn __main() -> ! {
            // type check the given path
            let f: fn() -> ! = $path;

            f()
        }
    }
}
#}

Then the application writers can invoke it like this:

$ cat src/main.rs
#![no_std]
#![no_main]

use rt::entry;

entry!(main);

fn main() -> ! {
    let _x = 42;

    loop {}
}

Now the author will get an error if they change the signature of main to be non divergent function, e.g. fn().

Life before main

rt is looking good but it's not feature complete! Applications written against it can't use static variables or string literals because rt's linker script doesn't define the standard .bss, .data and .rodata sections. Let's fix that!

The first step is to define these sections in the linker script:

$ # showing just a fragment of the file
$ sed -n 25,46p ../rt/link.x
  .text :
  {
    *(.text .text.*);
  } > FLASH

  /* NEW! */
  .rodata :
  {
    *(.rodata .rodata.*);
  } > FLASH

  .bss :
  {
    *(.bss .bss.*);
  } > RAM

  .data :
  {
    *(.data .data.*);
  } > RAM

  /DISCARD/ :

They just re-export the input sections and specify in which memory region each output section will go.

With these changes, the following program will compile:

#![no_std]
#![no_main]

use rt::entry;

entry!(main);

static RODATA: &[u8] = b"Hello, world!";
static mut BSS: u8 = 0;
static mut DATA: u16 = 1;

fn main() -> ! {
    let _x = RODATA;
    let _y = unsafe { &BSS };
    let _z = unsafe { &DATA };

    loop {}
}

However if you run this program on real hardware and debug it, you'll observe that the static variables BSS and DATA don't have the values 0 and 1 by the time main has been reached. Instead, these variables will have junk values. The problem is that the contents of RAM are random after powering up the device. You won't be able to observe this effect if you run the program in QEMU.

As things stand if your program reads any static variable before performing a write to it then your program has undefined behavior. Let's fix that by initializing all static variables before calling main.

We'll need to tweak the linker script a bit more to do the RAM initialization:

$ # showing just a fragment of the file
$ sed -n 25,52p ../rt/link.x
  .text :
  {
    *(.text .text.*);
  } > FLASH

  /* CHANGED! */
  .rodata :
  {
    *(.rodata .rodata.*);
  } > FLASH

  .bss :
  {
    _sbss = .;
    *(.bss .bss.*);
    _ebss = .;
  } > RAM

  .data : AT(ADDR(.rodata) + SIZEOF(.rodata))
  {
    _sdata = .;
    *(.data .data.*);
    _edata = .;
  } > RAM

  _sidata = LOADADDR(.data);

  /DISCARD/ :

Let's go into the details of these changes:

    _sbss = .;
    _ebss = .;
    _sdata = .;
    _edata = .;

We associate symbols to the start and end addresses of the .bss and .data sections, which we'll later use from Rust code.

  .data : AT(ADDR(.rodata) + SIZEOF(.rodata))

We set the Load Memory Address (LMA) of the .data section to the end of the .rodata section. The .data contains static variables with a non-zero initial value; the Virtual Memory Address (VMA) of the .data section is somewhere in RAM -- this is where the static variables are located. The initial values of those static variables, however, must be allocated in non volatile memory (Flash); the LMA is where in Flash those initial values are stored.

  _sidata = LOADADDR(.data);

Finally, we associate a symbol to the LMA of .data.

On the Rust side, we zero the .bss section and initialize the .data section. We can reference the symbols we created in the linker script from the Rust code. The addresses1 of these symbols are the boundaries of the .bss and .data sections.

The updated reset handler is shown below:

$ head -n32 ../rt/src/lib.rs
#![no_std]

use core::panic::PanicInfo;
use core::ptr;

#[no_mangle]
pub unsafe extern "C" fn Reset() -> ! {
    // NEW!
    // Initialize RAM
    extern "C" {
        static mut _sbss: u8;
        static mut _ebss: u8;

        static mut _sdata: u8;
        static mut _edata: u8;
        static _sidata: u8;
    }

    let count = &_ebss as *const u8 as usize - &_sbss as *const u8 as usize;
    ptr::write_bytes(&mut _sbss as *mut u8, 0, count);

    let count = &_edata as *const u8 as usize - &_sdata as *const u8 as usize;
    ptr::copy_nonoverlapping(&_sidata as *const u8, &mut _sdata as *mut u8, count);

    // Call user entry point
    extern "Rust" {
        fn main() -> !;
    }

    main()
}

Now end users can directly and indirectly make use of static variables without running into undefined behavior!

In the code above we performed the memory initialization in a bytewise fashion. It's possible to force the .bss and .data sections to be aligned to, say, 4 bytes. This fact can then be used in the Rust code to perform the initialization wordwise while omitting alignment checks. If you are interested in learning how this can be achieved check the cortex-m-rt crate.

1

The fact that the addresses of the linker script symbols must be used here can be confusing and unintuitive. An elaborate explanation for this oddity can be found here.

Exception handling

During the "Memory layout" section, we decided to start out simple and leave out handling of exceptions. In this section, we'll add support for handling them; this serves as an example of how to achieve compile time overridable behavior in stable Rust (i.e. without relying on the unstable #[linkage = "weak"] attribute, which makes a symbol weak).

Background information

In a nutshell, exceptions are a mechanism the Cortex-M and other architectures provide to let applications respond to asynchronous, usually external, events. The most prominent type of exception, that most people will know, is the classical (hardware) interrupt.

The Cortex-M exception mechanism works like this: When the processor receives a signal or event associated to a type of exception, it suspends the execution of the current subroutine (by stashing the state in the call stack) and then proceeds to execute the corresponding exception handler, another subroutine, in a new stack frame. After finishing the execution of the exception handler (i.e. returning from it), the processor resumes the execution of the suspended subroutine.

The processor uses the vector table to decide what handler to execute. Each entry in the table contains a pointer to a handler, and each entry corresponds to a different exception type. For example, the second entry is the reset handler, the third entry is the NMI (Non Maskable Interrupt) handler, and so on.

As mentioned before, the processor expects the vector table to be at some specific location in memory, and each entry in it can potentially be used by the processor at runtime. Hence, the entries must always contain valid values. Furthermore, we want the rt crate to be flexible so the end user can customize the behavior of each exception handler. Finally, the vector table resides in read only memory, or rather in not easily modified memory, so the user has to register the handler statically, rather than at runtime.

To satisfy all these constraints, we'll assign a default value to all the entries of the vector table in the rt crate, but make these values kind of weak to let the end user override them at compile time.

Rust side

Let's see how all this can be implemented. For simplicity, we'll only work with the first 16 entries of the vector table; these entries are not device specific so they have the same function on any kind of Cortex-M microcontroller.

The first thing we'll do is create an array of vectors (pointers to exception handlers) in the rt crate's code:

$ sed -n 56,91p ../rt/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
pub union Vector {
    reserved: u32,
    handler: unsafe extern "C" fn(),
}

extern "C" {
    fn NMI();
    fn HardFault();
    fn MemManage();
    fn BusFault();
    fn UsageFault();
    fn SVCall();
    fn PendSV();
    fn SysTick();
}

#[link_section = ".vector_table.exceptions"]
#[no_mangle]
pub static EXCEPTIONS: [Vector; 14] = [
    Vector { handler: NMI },
    Vector { handler: HardFault },
    Vector { handler: MemManage },
    Vector { handler: BusFault },
    Vector {
        handler: UsageFault,
    },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { handler: SVCall },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { handler: PendSV },
    Vector { handler: SysTick },
];
#}

Some of the entries in the vector table are reserved; the ARM documentation states that they should be assigned the value 0 so we use a union to do exactly that. The entries that must point to a handler make use of external functions; this is important because it lets the end user provide the actual function definition.

Next, we define a default exception handler in the Rust code. Exceptions that have not been assigned a handler by the end user will make use of this default handler.

$ tail -n4 ../rt/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
#[no_mangle]
pub extern "C" fn DefaultExceptionHandler() {
    loop {}
}
#}

Linker script side

On the linker script side, we place these new exception vectors right after the reset vector.

$ sed -n 12,25p ../rt/link.x
EXTERN(RESET_VECTOR);
EXTERN(EXCEPTIONS); /* <- NEW */

SECTIONS
{
  .vector_table ORIGIN(FLASH) :
  {
    /* First entry: initial Stack Pointer value */
    LONG(ORIGIN(RAM) + LENGTH(RAM));

    /* Second entry: reset vector */
    KEEP(*(.vector_table.reset_vector));

    /* The next 14 entries are exception vectors */
    KEEP(*(.vector_table.exceptions)); /* <- NEW */
  } > FLASH

And we use PROVIDE to give a default value to the handlers that we left undefined in rt (NMI and the others above):

$ tail -n8 ../rt/link.x
PROVIDE(NMI = DefaultExceptionHandler);
PROVIDE(HardFault = DefaultExceptionHandler);
PROVIDE(MemManage = DefaultExceptionHandler);
PROVIDE(BusFault = DefaultExceptionHandler);
PROVIDE(UsageFault = DefaultExceptionHandler);
PROVIDE(SVCall = DefaultExceptionHandler);
PROVIDE(PendSV = DefaultExceptionHandler);
PROVIDE(SysTick = DefaultExceptionHandler);

PROVIDE only takes effect when the symbol to the left of the equal sign is still undefined after inspecting all the input object files. This is the scenario where the user didn't implement the handler for the respective exception.

Testing it

That's it! The rt crate now has support for exception handlers. We can test it out with following application:

NOTE: Turns out it's hard to generate an exception in QEMU. On real hardware a read to an invalid memory address (i.e. outside of the Flash and RAM regions) would be enough but QEMU happily accepts the operation and returns zero. A trap instruction works on both QEMU and hardware but unfortunately it's not available on stable so you'll have to temporarily switch to nightly to run this and the next example.

#![feature(core_intrinsics)]
#![no_main]
#![no_std]

use core::intrinsics;

use rt::entry;

entry!(main);

fn main() -> ! {
    // this executes the undefined instruction (UDF) and causes a HardFault exception
    unsafe { intrinsics::abort() }
}

(gdb) target remote :3333
Remote debugging using :3333
Reset () at ../rt/src/lib.rs:7
7       pub unsafe extern "C" fn Reset() -> ! {

(gdb) b DefaultExceptionHandler
Breakpoint 1 at 0xec: file ../rt/src/lib.rs, line 95.

(gdb) continue
Continuing.

Breakpoint 1, DefaultExceptionHandler ()
    at ../rt/src/lib.rs:95
95          loop {}

(gdb) list
90          Vector { handler: SysTick },
91      ];
92
93      #[no_mangle]
94      pub extern "C" fn DefaultExceptionHandler() {
95          loop {}
96      }

And for completeness, here's the disassembly of the optimized version of the program:

$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex

app:	file format ELF32-arm-little

Disassembly of section .text:
main:
      40:	trap
      42:	trap

Reset:
      44:	movw	r1, #0x0
      48:	movw	r0, #0x0
      4c:	movt	r1, #0x2000
      50:	movt	r0, #0x2000
      54:	subs	r1, r1, r0
      56:	bl	#0xd2
      5a:	movw	r1, #0x0
      5e:	movw	r0, #0x0
      62:	movt	r1, #0x2000
      66:	movt	r0, #0x2000
      6a:	subs	r2, r1, r0
      6c:	movw	r1, #0x0
      70:	movt	r1, #0x0
      74:	bl	#0x8
      78:	bl	#-0x3c
      7c:	trap

DefaultExceptionHandler:
      7e:	b	#-0x4 <DefaultExceptionHandler>
$ cargo objdump --bin app --release -- -s -j .vector_table

app:	file format ELF32-arm-little

Contents of section .vector_table:
 0000 00000120 45000000 7f000000 7f000000  ... E...........
 0010 7f000000 7f000000 7f000000 00000000  ................
 0020 00000000 00000000 00000000 7f000000  ................
 0030 00000000 00000000 7f000000 7f000000  ................

The vector table now resembles the results of all the code snippets in this book so far. To summarize:

  • In the Inspecting it section of the earlier memory chapter, we learned that:
    • The first entry in the vector table contains the initial value of the stack pointer.
    • Objdump prints in little endian format, so the stack starts at 0x2001_0000.
    • The second entry points to address 0x0000_0045, the Reset handler.
      • The address of the Reset handler can be seen in the disassembly above, being 0x44.
      • The first bit being set to 1 does not alter the address due to alignment requirements. Instead, it causes the function to be executed in thumb mode.
  • Afterwards, a pattern of addresses alternating between 0x7f and 0x00 is visible.
    • Looking at the disassembly above, it is clear that 0x7f refers to the DefaultExceptionHandler (0x7e executed in thumb mode).
    • Cross referencing the pattern to the vector table that was set up earlier in this chapter (see the definition of pub static EXCEPTIONS) with the vector table layout for the Cortex-M, it is clear that the address of the DefaultExceptionHandler is present each time a respective handler entry is present in the table.
    • In turn, it is also visible that the layout of the vector table data structure in the Rust code is aligned with all the reserved slots in the Cortex-M vector table. Hence, all reserved slots are correctly set to a value of zero.

Overriding a handler

To override an exception handler, the user has to provide a function whose symbol name exactly matches the name we used in EXCEPTIONS.

#![feature(core_intrinsics)]
#![no_main]
#![no_std]

use core::intrinsics;

use rt::entry;

entry!(main);

fn main() -> ! {
    unsafe { intrinsics::abort() }
}

#[no_mangle]
pub extern "C" fn HardFault() -> ! {
    // do something interesting here
    loop {}
}

You can test it in QEMU

(gdb) target remote :3333
Remote debugging using :3333
Reset () at /home/japaric/rust/embedonomicon/ci/exceptions/rt/src/lib.rs:7
7       pub unsafe extern "C" fn Reset() -> ! {

(gdb) b HardFault
Breakpoint 1 at 0x44: file src/main.rs, line 18.

(gdb) continue
Continuing.

Breakpoint 1, HardFault () at src/main.rs:18
18          loop {}

(gdb) list
13      }
14
15      #[no_mangle]
16      pub extern "C" fn HardFault() -> ! {
17          // do something interesting here
18          loop {}
19      }

The program now executes the user defined HardFault function instead of the DefaultExceptionHandler in the rt crate.

Like our first attempt at a main interface, this first implementation has the problem of having no type safety. It's also easy to mistype the name of the exception, but that doesn't produce an error or warning. Instead the user defined handler is simply ignored. Those problems can be fixed using a macro like the exception! macro defined in cortex-m-rt v0.5.x or the exception attribute in cortex-m-rt v0.6.x.

Assembly on stable

So far we have managed to boot the device and handle interrupts without a single line of assembly. That's quite a feat! But depending on the architecture you are targeting you may need some assembly to get to this point. There are also some operations like context switching that require assembly, etc.

The problem is that both inline assembly (asm!) and free form assembly (global_asm!) are unstable, and there's no estimate for when they'll be stabilized, so you can't use them on stable . This is not a showstopper because there are some workarounds which we'll document here.

To motivate this section we'll tweak the HardFault handler to provide information about the stack frame that generated the exception.

Here's what we want to do:

Instead of letting the user directly put their HardFault handler in the vector table we'll make the rt crate put a trampoline to the user-defined HardFault handler in the vector table.

$ tail -n36 ../rt/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
extern "C" {
    fn NMI();
    fn HardFaultTrampoline(); // <- CHANGED!
    fn MemManage();
    fn BusFault();
    fn UsageFault();
    fn SVCall();
    fn PendSV();
    fn SysTick();
}

#[link_section = ".vector_table.exceptions"]
#[no_mangle]
pub static EXCEPTIONS: [Vector; 14] = [
    Vector { handler: NMI },
    Vector { handler: HardFaultTrampoline }, // <- CHANGED!
    Vector { handler: MemManage },
    Vector { handler: BusFault },
    Vector {
        handler: UsageFault,
    },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { handler: SVCall },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { handler: PendSV },
    Vector { handler: SysTick },
];

#[no_mangle]
pub extern "C" fn DefaultExceptionHandler() {
    loop {}
}
#}

This trampoline will read the stack pointer and then call the user HardFault handler. The trampoline will have to be written in assembly:

  mrs r0, MSP
  b HardFault

Due to how the ARM ABI works this sets the Main Stack Pointer (MSP) as the first argument of the HardFault function / routine. This MSP value also happens to be a pointer to the registers pushed to the stack by the exception. With these changes the user HardFault handler must now have signature fn(&StackedRegisters) -> !.

.s files

One approach to stable assembly is to write the assembly in an external file:

$ cat ../rt/asm.s
  .section .text.HardFaultTrampoline
  .global HardFaultTrampoline
  .thumb_func
HardFaultTrampoline:
  mrs r0, MSP
  b HardFault

And use the cc crate in the build script of the rt crate to assemble that file into an object file (.o) and then into an archive (.a).

$ cat ../rt/build.rs
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};

use cc::Build;

fn main() -> Result<(), Box<Error>> {
    // build directory for this crate
    let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());

    // extend the library search path
    println!("cargo:rustc-link-search={}", out_dir.display());

    // put `link.x` in the build directory
    File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?;

    // assemble the `asm.s` file
    Build::new().file("asm.s").compile("asm"); // <- NEW!

    Ok(())
}

$ tail -n2 ../rt/Cargo.toml
[build-dependencies]
cc = "1.0.25"

And that's it!

We can confirm that the vector table contains a pointer to HardFaultTrampoline by writing a very simple program.

#![no_main]
#![no_std]

use rt::entry;

entry!(main);

fn main() -> ! {
    loop {}
}

#[allow(non_snake_case)]
#[no_mangle]
pub fn HardFault(_ef: *const u32) -> ! {
    loop {}
}

Here's the disassembly. Look at the address of HardFaultTrampoline.

$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex

app:	file format ELF32-arm-little

Disassembly of section .text:
HardFault:
      40:	b	#-0x4 <HardFault>

main:
      42:	trap

Reset:
      44:	bl	#-0x6
      48:	trap

DefaultExceptionHandler:
      4a:	b	#-0x4 <DefaultExceptionHandler>

UsageFault:
      4b: <unknown>

HardFaultTrampoline:
      4c:	mrs	r0, msp
      50:	b	#-0x14 <HardFault>

NOTE: To make this disassembly smaller I commented out the initialization of RAM

Now look at the vector table. The 4th entry should be the address of HardFaultTrampoline plus one.

$ cargo objdump --bin app --release -- -s -j .vector_table

app:	file format ELF32-arm-little

Contents of section .vector_table:
 0000 00000120 45000000 4b000000 4d000000  ... E...K...M...
 0010 4b000000 4b000000 4b000000 00000000  K...K...K.......
 0020 00000000 00000000 00000000 4b000000  ............K...
 0030 00000000 00000000 4b000000 4b000000  ........K...K...

.o / .a files

The downside of using the cc crate is that it requires some assembler program on the build machine. For example when targeting ARM Cortex-M the cc crate uses arm-none-eabi-gcc as the assembler.

Instead of assembling the file on the build machine we can ship a pre-assembled file with the rt crate. That way no assembler program is required on the build machine. However, you would still need an assembler on the machine that packages and publishes the crate.

There's not much difference between an assembly (.s) file and its compiled version: the object (.o) file. The assembler doesn't do any optimization; it simply chooses the right object file format for the target architecture.

Cargo provides support for bundling archives (.a) with crates. We can package object files into an archive using the ar command and then bundle the archive with the crate. In fact, this what the cc crate does; you can see the commands it invoked by searching for a file named output in the target directory.

$ grep running $(find target -name output)
running: "arm-none-eabi-gcc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-mthumb" "-march=armv7-m" "-Wall" "-Wextra" "-o" "/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/asm.o" "-c" "asm.s"
running: "ar" "crs" "/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/libasm.a" "/home/japaric/rust-embedded/embedonomicon/ci/asm/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/asm.o"
$ grep cargo $(find target -name output)
cargo:rustc-link-search=/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out
cargo:rustc-link-lib=static=asm
cargo:rustc-link-search=native=/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out

We'll do something similar to produce an archive.

$ # most of flags `cc` uses have no effect when assembling so we drop them
$ arm-none-eabi-as -march=armv7-m asm.s -o asm.o

$ ar crs librt.a asm.o

$ arm-none-eabi-objdump -Cd librt.a
In archive librt.a:

asm.o:     file format elf32-littlearm


Disassembly of section .text.HardFaultTrampoline:

00000000 <HardFaultTrampoline>:
   0:	f3ef 8008 	mrs	r0, MSP
   4:	e7fe      	b.n	0 <HardFault>

Next we modify the build script to bundle this archive with the rt rlib.

$ cat ../rt/build.rs
use std::{
    env,
    error::Error,
    fs::{self, File},
    io::Write,
    path::PathBuf,
};

fn main() -> Result<(), Box<Error>> {
    // build directory for this crate
    let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());

    // extend the library search path
    println!("cargo:rustc-link-search={}", out_dir.display());

    // put `link.x` in the build directory
    File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?;

    // link to `librt.a`
    fs::copy("librt.a", out_dir.join("librt.a"))?; // <- NEW!
    println!("cargo:rustc-link-lib=static=rt"); // <- NEW!

    Ok(())
}

Now we can test this new version against the simple program from before and we'll get the same output.

$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex

app:	file format ELF32-arm-little

Disassembly of section .text:
HardFault:
      40:	b	#-0x4 <HardFault>

main:
      42:	trap

Reset:
      44:	bl	#-0x6
      48:	trap

DefaultExceptionHandler:
      4a:	b	#-0x4 <DefaultExceptionHandler>

UsageFault:
      4b: <unknown>

HardFaultTrampoline:
      4c:	mrs	r0, msp
      50:	b	#-0x14 <HardFault>

NOTE: As before I have commented out the RAM initialization to make the disassembly smaller.

$ cargo objdump --bin app --release -- -s -j .vector_table

app:	file format ELF32-arm-little

Contents of section .vector_table:
 0000 00000120 45000000 4b000000 4d000000  ... E...K...M...
 0010 4b000000 4b000000 4b000000 00000000  K...K...K.......
 0020 00000000 00000000 00000000 4b000000  ............K...
 0030 00000000 00000000 4b000000 4b000000  ........K...K...

The downside of shipping pre-assembled archives is that, in the worst case scenario, you'll need to ship one build artifact for each compilation target your library supports.

Logging with symbols

This section will show you how to utilize symbols and the ELF format to achieve super cheap logging.

Arbitrary symbols

Whenever we needed a stable symbol interface between crates we have mainly used the no_mangle attribute and sometimes the export_name attribute. The export_name attribute takes a string which becomes the name of the symbol whereas #[no_mangle] is basically sugar for #[export_name = <item-name>].

Turns out we are not limited to single word names; we can use arbitrary strings, e.g. sentences, as the argument of the export_name attribute. As least when the output format is ELF anything that doesn't contain a null byte is fine.

Let's check that out:

$ cargo new --lib foo

$ cat foo/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
#[export_name = "Hello, world!"]
#[used]
static A: u8 = 0;

#[export_name = "こんにちは"]
#[used]
static B: u8 = 0;
#}
$ ( cd foo && cargo nm --lib )
foo-d26a39c34b4e80ce.3lnzqy0jbpxj4pld.rcgu.o:
0000000000000000 r Hello, world!
0000000000000000 V __rustc_debug_gdb_scripts_section__
0000000000000000 r こんにちは

Can you see where this is going?

Encoding

Here's what we'll do: we'll create one static variable per log message but instead of storing the messages in the variables we'll store the messages in the variables' symbol names. What we'll log then will not be the contents of the static variables but their addresses.

As long as the static variables are not zero sized each one will have a different address. What we're doing here is effectively encoding each message into a unique identifier, which happens to be the variable address. Some part of the log system will have to decode this id back into the message.

Let's write some code to illustrate the idea.

In this example we'll need some way to do I/O so we'll use the cortex-m-semihosting crate for that. Semihosting is a technique for having a target device borrow the host I/O capabilities; the host here usually refers to the machine that's debugging the target device. In our case, QEMU supports semihosting out of the box so there's no need for a debugger. On a real device you'll have other ways to do I/O like a serial port; we use semihosting in this case because it's the easiest way to do I/O on QEMU.

Here's the code

#![no_main]
#![no_std]

use core::fmt::Write;
use cortex_m_semihosting::{debug, hio};

use rt::entry;

entry!(main);

fn main() -> ! {
    let mut hstdout = hio::hstdout().unwrap();

    #[export_name = "Hello, world!"]
    static A: u8 = 0;

    writeln!(hstdout, "{:#x}", &A as *const u8 as usize);

    #[export_name = "Goodbye"]
    static B: u8 = 0;

    writeln!(hstdout, "{:#x}", &B as *const u8 as usize);

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

We also make use of the debug::exit API to have the program terminate the QEMU process. This is a convenience so we don't have to manually terminate the QEMU process.

And here's the dependencies section of the Cargo.toml:

[dependencies]
cortex-m-semihosting = "0.3.1"
rt = { path = "../rt" }

Now we can build the program

$ cargo build

To run it we'll have to add the --semihosting-config flag to our QEMU invocation:

$ qemu-system-arm \
      -cpu cortex-m3 \
      -machine lm3s6965evb \
      -nographic \
      -semihosting-config enable=on,target=native \
      -kernel target/thumbv7m-none-eabi/debug/app
0x1fe0
0x1fe1

NOTE: These addresses may not be the ones you get locally because addresses of static variable are not guaranteed to remain the same when the toolchain is changed (e.g. optimizations may have improved).

Now we have two addresses printed to the console.

Decoding

How do we convert these addresses into strings? The answer is in the symbol table of the ELF file.

$ cargo objdump --bin app -- -t | grep '\.rodata\s*0*1\b'
00001fe1 g       .rodata		 00000001 Goodbye
00001fe0 g       .rodata		 00000001 Hello, world!

$ # first column is the symbol address; last column is the symbol name

objdump -t prints the symbol table. This table contains all the symbols but we are only looking for the ones in the .rodata section and whose size is one byte (our variables have type u8).

It's important to note that the address of the symbols will likely change when optimizing the program. Let's check that.

PROTIP You can set target.thumbv7m-none-eabi.runner to the long QEMU command from before (qemu-system-arm -cpu (..) -kernel) in the Cargo configuration file (.cargo/conifg) to have cargo run use that runner to execute the output binary.

$ head -n2 .cargo/config
[target.thumbv7m-none-eabi]
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel"
$ cargo run --release
     Running `qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel target/thumbv7m-none-eabi/release/app`
0xb9c
0xb9d

$ cargo objdump --bin app --release -- -t | grep '\.rodata\s*0*1\b'
00000b9d g     O .rodata	00000001 Goodbye
00000b9c g     O .rodata	00000001 Hello, world!

So make sure to always look for the strings in the ELF file you executed.

Of course, the process of looking up the strings in the ELF file can be automated using a tool that parses the symbol table (.symtab section) contained in the ELF file. Implementing such tool is out of scope for this book and it's left as an exercise for the reader.

Making it zero cost

Can we do better? Yes, we can!

The current implementation places the static variables in .rodata, which means they occupy size in Flash even though we never use their contents. Using a little bit of linker script magic we can make them occupy zero space in Flash.

$ cat log.x
SECTIONS
{
  .log 0 (INFO) : {
    *(.log);
  }
}

We'll place the static variables in this new output .log section. This linker script will collect all the symbols in the .log sections of input object files and put them in an output .log section. We have seen this pattern in the Memory layout chapter.

The new bit here is the (INFO) part; this tells the linker that this section is a non-allocatable section. Non-allocatable sections are kept in the ELF binary as metadata but they are not loaded onto the target device.

We also specified the start address of this output section: the 0 in .log 0 (INFO).

The other improvement we can do is switch from formatted I/O (fmt::Write) to binary I/O, that is send the addresses to the host as bytes rather than as strings.

Binary serialization can be hard but we'll keep things super simple by serializing each address as a single byte. With this approach we don't have to worry about endianness or framing. The downside of this format is that a single byte can only represent up to 256 different addresses.

Let's make those changes:

#![no_main]
#![no_std]

use cortex_m_semihosting::{debug, hio};

use rt::entry;

entry!(main);

fn main() -> ! {
    let mut hstdout = hio::hstdout().unwrap();

    #[export_name = "Hello, world!"]
    #[link_section = ".log"] // <- NEW!
    static A: u8 = 0;

    let address = &A as *const u8 as usize as u8;
    hstdout.write_all(&[address]).unwrap(); // <- CHANGED!

    #[export_name = "Goodbye"]
    #[link_section = ".log"] // <- NEW!
    static B: u8 = 0;

    let address = &B as *const u8 as usize as u8;
    hstdout.write_all(&[address]).unwrap(); // <- CHANGED!

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

Before you run this you'll have to append -Tlog.x to the arguments passed to the linker. That can be done in the Cargo configuration file.

$ cat .cargo/config
[target.thumbv7m-none-eabi]
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel"
rustflags = [
  "-C", "link-arg=-Tlink.x",
  "-C", "link-arg=-Tlog.x", # <- NEW!
]

[build]
target = "thumbv7m-none-eabi"

Now you can run it! Since the output now has a binary format we'll pipe it through the xxd command to reformat it as a hexadecimal string.

$ cargo run | xxd -p
0001

The addresses are 0x00 and 0x01. Let's now look at the symbol table.

$ cargo objdump --bin app -- -t | grep '\.log'
00000001 g       .log		 00000001 Goodbye
00000000 g       .log		 00000001 Hello, world!

There are our strings. You'll notice that their addresses now start at zero; this is because we set a start address for the output .log section.

Each variable is 1 byte in size because we are using u8 as their type. If we used something like u16 then all address would be even and we would not be able to efficiently use all the address space (0...255).

Packaging it up

You've noticed that the steps to log a string are always the same so we can refactor them into a macro that lives in its own crate. Also, we can make the logging library more reusable by abstracting the I/O part behind a trait.

$ cargo new --lib log

$ cat log/src/lib.rs

# #![allow(unused_variables)]
#![no_std]

#fn main() {
pub trait Log {
    type Error;

    fn log(&mut self, address: u8) -> Result<(), Self::Error>;
}

#[macro_export]
macro_rules! log {
    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log"]
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

#}

Given that this library depends on the .log section it should be its responsibility to provide the log.x linker script so let's make that happen.

$ mv log.x ../log/
$ cat ../log/build.rs
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};

fn main() -> Result<(), Box<Error>> {
    // Put the linker script somewhere the linker can find it
    let out = PathBuf::from(env::var("OUT_DIR")?);

    File::create(out.join("log.x"))?.write_all(include_bytes!("log.x"))?;

    println!("cargo:rustc-link-search={}", out.display());

    Ok(())
}

Now we can refactor our application to use the log! macro:

$ cat src/main.rs
#![no_main]
#![no_std]

use cortex_m_semihosting::{
    debug,
    hio::{self, HStdout},
};

use log::{log, Log};
use rt::entry;

struct Logger {
    hstdout: HStdout,
}

impl Log for Logger {
    type Error = ();

    fn log(&mut self, address: u8) -> Result<(), ()> {
        self.hstdout.write_all(&[address])
    }
}

entry!(main);

fn main() -> ! {
    let hstdout = hio::hstdout().unwrap();
    let mut logger = Logger { hstdout };

    log!(logger, "Hello, world!");

    log!(logger, "Goodbye");

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

Don't forget to update the Cargo.toml file to depend on the new log crate.

$ tail -n4 Cargo.toml
[dependencies]
cortex-m-semihosting = "0.3.1"
log = { path = "../log" }
rt = { path = "../rt" }
$ cargo run | xxd -p
0001

$ cargo objdump --bin app -- -t | grep '\.log'
00000001 g       .log		 00000001 Goodbye
00000000 g       .log		 00000001 Hello, world!

Same output as before!

Bonus: Multiple log levels

Many logging frameworks provide ways to log messages at different log levels. These log levels convey the severity of the message: "this is an error", "this is just a warning", etc. These log levels can be used to filter out unimportant messages when searching for e.g. error messages.

We can extend our logging library to support log levels without increasing its footprint. Here's how we'll do that:

We have a flat address space for the messages: from 0 to 255 (inclusive). To keep things simple let's say we only want to differentiate between error messages and warning messages. We can place all the error messages at the beginning of the address space, and all the warning messages after the error messages. If the decoder knows the address of the first warning message then it can classify the messages. This idea can be extended to support more than two log levels.

Let's test the idea by replacing the log macro with two new macros: error! and warn!.

$ cat ../log/src/lib.rs

# #![allow(unused_variables)]
#![no_std]

#fn main() {
pub trait Log {
    type Error;

    fn log(&mut self, address: u8) -> Result<(), Self::Error>;
}

/// Logs messages at the ERROR log level
#[macro_export]
macro_rules! error {
    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log.error"] // <- CHANGED!
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

/// Logs messages at the WARNING log level
#[macro_export]
macro_rules! warn {
    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log.warning"] // <- CHANGED!
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

#}

We distinguish errors from warnings by placing the messages in different link sections.

The next thing we have to do is update the linker script to place error messages before the warning messages.

$ cat ../log/log.x
SECTIONS
{
  .log 0 (INFO) : {
    *(.log.error);
    __log_warning_start__ = .;
    *(.log.warning);
  }
}

We also give a name, __log_warning_start__, to the boundary between the errors and the warnings. The address of this symbol will be the address of the first warning message.

We can now update the application to make use of these new macros.

$ cat src/main.rs
#![no_main]
#![no_std]

use cortex_m_semihosting::{
    debug,
    hio::{self, HStdout},
};

use log::{error, warn, Log};
use rt::entry;

entry!(main);

fn main() -> ! {
    let hstdout = hio::hstdout().unwrap();
    let mut logger = Logger { hstdout };

    warn!(logger, "Hello, world!"); // <- CHANGED!

    error!(logger, "Goodbye"); // <- CHANGED!

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

struct Logger {
    hstdout: HStdout,
}

impl Log for Logger {
    type Error = ();

    fn log(&mut self, address: u8) -> Result<(), ()> {
        self.hstdout.write_all(&[address])
    }
}

The output won't change much:

$ cargo run | xxd -p
0100

We still get two bytes in the output but the error is given the address 0 and the warning is given the address 1 even though the warning was logged first.

Now look at the symbol table.

$ cargo objdump --bin app -- -t | grep '\.log'
00000000 g       .log		 00000001 Goodbye
00000001 g       .log		 00000001 Hello, world!
00000001         .log		 00000000 __log_warning_start__

There's now an extra symbol, __log_warning_start__, in the .log section. The address of this symbol is the address of the first warning message. Symbols with addresses lower than this value are errors, and the rest of symbols are warnings.

With an appropriate decoder you could get the following human readable output from all this information:

WARNING Hello, world!
ERROR Goodbye

If you liked this section check out the stlog logging framework which is a complete implementation of this idea.

Global singletons

In this section we'll cover how to implement a global, shared singleton. The embedded Rust book covered local, owned singletons which are pretty much unique to Rust. Global singletons are essentially the singleton pattern you see in C and C++; they are not specific to embedded development but since they involve symbols they seemed a good fit for the embedonomicon.

TODO(resources team) link "the embedded Rust book" to the singletons section when it's up

To illustrate this section we'll extend the logger we developed in the last section to support global logging. The result will be very similar to the #[global_allocator] feature covered in the embedded Rust book.

TODO(resources team) link #[global_allocator] to the collections chapter of the book when it's in a more stable location.

Here's the summary of what we want to:

In the last section we created a log! macro to log messages through a specific logger, a value that implements the Log trait. The syntax of the log! macro is log!(logger, "String"). We want to extend the macro such that log!("String") also works. Using the logger-less version should log the message through a global logger; this is how std::println! works. We'll also need a mechanism to declare what the global logger is; this is the part that's similar to #[global_allocator].

It could be that the global logger is declared in the top crate and it could also be that the type of the global logger is defined in the top crate. In this scenario the dependencies can not know the exact type of the global logger. To support this scenario we'll need some indirection.

Instead of hardcoding the type of the global logger in the log crate we'll declare only the interface of the global logger in that crate. That is we'll add a new trait, GlobalLog, to the log crate. The log! macro will also have to make use of that trait.

$ cat ../log/src/lib.rs

# #![allow(unused_variables)]
#![no_std]

#fn main() {
// NEW!
pub trait GlobalLog: Sync {
    fn log(&self, address: u8);
}

pub trait Log {
    type Error;

    fn log(&mut self, address: u8) -> Result<(), Self::Error>;
}

#[macro_export]
macro_rules! log {
    // NEW!
    ($string:expr) => {
        unsafe {
            extern "Rust" {
                static LOGGER: &'static dyn $crate::GlobalLog;
            }

            #[export_name = $string]
            #[link_section = ".log"]
            static SYMBOL: u8 = 0;

            $crate::GlobalLog::log(LOGGER, &SYMBOL as *const u8 as usize as u8)
        }
    };

    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log"]
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

// NEW!
#[macro_export]
macro_rules! global_logger {
    ($logger:expr) => {
        #[no_mangle]
        pub static LOGGER: &dyn $crate::GlobalLog = &$logger;
    };
}

#}

There's quite a bit to unpack here.

Let's start with the trait.


# #![allow(unused_variables)]
#fn main() {
pub trait GlobalLog: Sync {
    fn log(&self, address: u8);
}
#}

Both GlobalLog and Log have a log method. The difference is that GlobalLog.log takes a shared reference to the receiver (&self). This is necessary because the global logger will be a static variable. More on that later.

The other difference is that GlobalLog.log doesn't return a Result. This means that it can not report errors to the caller. This is not a strict requirement for traits used to implement global singletons. Error handling in global singletons is fine but then all users of the global version of the log! macro have to agree on the error type. Here we are simplifying the interface a bit by having the GlobalLog implementer deal with the errors.

Yet another difference is that GlobalLog requires that the implementer is Sync, that is that it can be shared between threads. This is a requirement for values placed in static variables; their types must implement the Sync trait.

At this point it may not be entirely clear why the interface has to look this way. The other parts of the crate will make this clearer so keep reading.

Next up is the log! macro:


# #![allow(unused_variables)]
#fn main() {
    ($string:expr) => {
        unsafe {
            extern "Rust" {
                static LOGGER: &'static dyn $crate::GlobalLog;
            }

            #[export_name = $string]
            #[link_section = ".log"]
            static SYMBOL: u8 = 0;

            $crate::GlobalLog::log(LOGGER, &SYMBOL as *const u8 as usize as u8)
        }
    };
#}

When called without a specific $logger the macros uses an extern static variable called LOGGER to log the message. This variable is the global logger that's defined somewhere else; that's why we use the extern block. We saw this pattern in the main interface chapter.

We need to declare a type for LOGGER or the code won't type check. We don't know the concrete type of LOGGER at this point but we know, or rather require, that it implements the GlobalLog trait so we can use a trait object here.

The rest of the macro expansion looks very similar to the expansion of the local version of the log! macro so I won't explain it here as it's explained in the previous chapter.

Now that we know that LOGGER has to be a trait object it's clearer why we omitted the associated Error type in GlobalLog. If we had not omitted then we would have need to pick a type for Error in the type signature of LOGGER. This is what I earlier meant by "all users of log! would need to agree on the error type".

Now the final piece: the global_logger! macro. It could have been a proc macro attribute but it's easier to write a macro_rules! macro.


# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! global_logger {
    ($logger:expr) => {
        #[no_mangle]
        pub static LOGGER: &dyn $crate::GlobalLog = &$logger;
    };
}
#}

This macro creates the LOGGER variable that log! uses. Because we need a stable ABI interface we use the no_mangle attribute. This way the symbol name of LOGGER will be "LOGGER" which is what the log! macro expects.

The other important bit is that the type of this static variable must exactly match the type used in the expansion of the log! macro. If they don't match Bad Stuff will happen due to ABI mismatch.

Let's write an example that uses this new global logger functionality.

$ cat src/main.rs
#![no_main]
#![no_std]

use cortex_m::interrupt;
use cortex_m_semihosting::{
    debug,
    hio::{self, HStdout},
};

use log::{global_logger, log, GlobalLog};
use rt::entry;

struct Logger;

global_logger!(Logger);

entry!(main);

fn main() -> ! {
    log!("Hello, world!");

    log!("Goodbye");

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

impl GlobalLog for Logger {
    fn log(&self, address: u8) {
        // we use a critical section (`interrupt::free`) to make the access to the
        // `static mut` variable interrupt safe which is required for memory safety
        interrupt::free(|_| unsafe {
            static mut HSTDOUT: Option<HStdout> = None;

            // lazy initialization
            if HSTDOUT.is_none() {
                HSTDOUT = Some(hio::hstdout()?);
            }

            let hstdout = HSTDOUT.as_mut().unwrap();

            hstdout.write_all(&[address])
        }).ok(); // `.ok()` = ignore errors
    }
}

TODO(resources team) use cortex_m::Mutex instead of a static mut variable when const fn is stabilized.

We had to add cortex-m to the dependencies.

$ tail -n5 Cargo.toml
[dependencies]
cortex-m = "0.5.7"
cortex-m-semihosting = "0.3.1"
log = { path = "../log" }
rt = { path = "../rt" }

This is a port of one of the examples written in the previous section. The output is the same as what we got back there.

$ cargo run | xxd -p
0001

$ cargo objdump --bin app -- -t | grep '\.log'
00000001 g       .log		 00000001 Goodbye
00000000 g       .log		 00000001 Hello, world!


Some readers may be concerned about this implementation of global singletons not being zero cost because it uses trait objects which involve dynamic dispatch, that is method calls are performed through a vtable lookup.

However, it appears that LLVM is smart enough to eliminate the dynamic dispatch when compiling with optimizations / LTO. This can be confirmed by searching for LOGGER in the symbol table.

$ cargo objdump --bin app --release -- -t | grep LOGGER

If the static is missing that means that there is no vtable and that LLVM was capable of transforming all the LOGGER.log calls into Logger.log calls.

Direct Memory Access (DMA)

This section covers the core requirements for building a memory safe API around DMA transfers.

The DMA peripheral is used to perform memory transfers in parallel to the work of the processor (the execution of the main program). A DMA transfer is more or less equivalent to spawning a thread (see thread::spawn) to do a memcpy. We'll use the fork-join model to illustrate the requirements of a memory safe API.

Consider the following DMA primitives:


# #![allow(unused_variables)]
#fn main() {
/// A singleton that represents a single DMA channel (channel 1 in this case)
///
/// This singleton has exclusive access to the registers of the DMA channel 1
pub struct Dma1Channel1 {
    // ..
}

impl Dma1Channel1 {
    /// Data will be written to this `address`
    ///
    /// `inc` indicates whether the address will be incremented after every byte transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_destination_address(&mut self, address: usize, inc: bool) {
        // ..
    }

    /// Data will be read from this `address`
    ///
    /// `inc` indicates whether the address will be incremented after every byte transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_source_address(&mut self, address: usize, inc: bool) {
        // ..
    }

    /// Number of bytes to transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_transfer_length(&mut self, len: usize) {
        // ..
    }

    /// Starts the DMA transfer
    ///
    /// NOTE this performs a volatile write
    pub fn start(&mut self) {
        // ..
    }

    /// Stops the DMA transfer
    ///
    /// NOTE this performs a volatile write
    pub fn stop(&mut self) {
        // ..
    }

    /// Returns `true` if there's a transfer in progress
    ///
    /// NOTE this performs a volatile read
    pub fn in_progress() -> bool {
        // ..
    }
}
#}

Assume that the Dma1Channel1 is statically configured to work with serial port (AKA UART or USART) #1, Serial1, in one-shot mode (i.e. not circular mode). Serial1 provides the following blocking API:


# #![allow(unused_variables)]
#fn main() {
/// A singleton that represents serial port #1
pub struct Serial1 {
    // ..
}

impl Serial1 {
    /// Reads out a single byte
    ///
    /// NOTE: blocks if no byte is available to be read
    pub fn read(&mut self) -> Result<u8, Error> {
        // ..
    }

    /// Sends out a single byte
    ///
    /// NOTE: blocks if the output FIFO buffer is full
    pub fn write(&mut self, byte: u8) -> Result<(), Error> {
        // ..
    }
}
#}

Let's say we want to extend Serial1 API to (a) asynchronously send out a buffer and (b) asynchronously fill a buffer.

We'll start with a memory unsafe API and we'll iterate on it until it's completely memory safe. On each step we'll show you how the API can be broken to make you aware of the issues that need to be addressed when dealing with asynchronous memory operations.

A first stab

For starters, let's try to use the Write::write_all API as a reference. To keep things simple let's ignore all error handling.


# #![allow(unused_variables)]
#fn main() {
/// A singleton that represents serial port #1
pub struct Serial1 {
    // NOTE: we extend this struct by adding the DMA channel singleton
    dma: Dma1Channel1,
    // ..
}

impl Serial1 {
    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<'a>(mut self, buffer: &'a [u8]) -> Transfer<&'a [u8]> {
        self.dma.set_destination_address(USART1_TX, false);
        self.dma.set_source_address(buffer.as_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        self.dma.start();

        Transfer { buffer }
    }
}

/// A DMA transfer
pub struct Transfer<B> {
    buffer: B,
}

impl<B> Transfer<B> {
    /// Returns `true` if the DMA transfer has finished
    pub fn is_done(&self) -> bool {
        !Dma1Channel1::in_progress()
    }

    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(self) -> B {
        // Busy wait until the transfer is done
        while !self.is_done() {}

        self.buffer
    }
}
#}

NOTE: Transfer could expose a futures or generator based API instead of the API shown above. That's an API design question that has little bearing on the memory safety of the overall API so we won't delve into it in this text.

We can also implement an asynchronous version of Read::read_exact.


# #![allow(unused_variables)]
#fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<'a>(&mut self, buffer: &'a mut [u8]) -> Transfer<&'a mut [u8]> {
        self.dma.set_source_address(USART1_RX, false);
        self.dma
            .set_destination_address(buffer.as_mut_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        self.dma.start();

        Transfer { buffer }
    }
}
#}

Here's how to use the write_all API:


# #![allow(unused_variables)]
#fn main() {
fn write(serial: Serial1) {
    // fire and forget
    serial.write_all(b"Hello, world!\n");

    // do other stuff
}
#}

And here's an example of using the read_exact API:


# #![allow(unused_variables)]
#fn main() {
fn read(mut serial: Serial1) {
    let mut buf = [0; 16];
    let t = serial.read_exact(&mut buf);

    // do other stuff

    t.wait();

    match buf.split(|b| *b == b'\n').next() {
        Some(b"some-command") => { /* do something */ }
        _ => { /* do something else */ }
    }
}
#}

mem::forget

mem::forget is a safe API. If our API is truly safe then we should be able to use both together without running into undefined behavior. However, that's not the case; consider the following example:


# #![allow(unused_variables)]
#fn main() {
fn unsound(mut serial: Serial1) {
    start(&mut serial);
    bar();
}

#[inline(never)]
fn start(serial: &mut Serial1) {
    let mut buf = [0; 16];

    // start a DMA transfer and forget the returned `Transfer` value
    mem::forget(serial.read_exact(&mut buf));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
#}

Here we start a DMA transfer, in foo, to fill an array allocated on the stack and then mem::forget the returned Transfer value. Then we proceed to return from foo and execute the function bar.

This series of operations results in undefined behavior. The DMA transfer writes to stack memory but that memory is released when foo returns and then reused by bar to allocate variables like x and y. At runtime this could result in variables x and y changing their value at random times. The DMA transfer could also overwrite the state (e.g. link register) pushed onto the stack by the prologue of function bar.

Note that if we had not use mem::forget, but mem::drop, it would have been possible to make Transfer's destructor stop the DMA transfer and then the program would have been safe. But one can not rely on destructors running to enforce memory safety because mem::forget and memory leaks (see RC cycles) are safe in Rust.

We can fix this particular problem by changing the lifetime of the buffer from 'a to 'static in both APIs.


# #![allow(unused_variables)]
#fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact(&mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        // .. same as before ..
    }
}
#}

If we try to replicate the previous problem we note that mem::forget no longer causes problems.


# #![allow(unused_variables)]
#fn main() {
#[allow(dead_code)]
fn sound(mut serial: Serial1, buf: &'static mut [u8; 16]) {
    // NOTE `buf` is moved into `foo`
    foo(&mut serial, buf);
    bar();
}

#[inline(never)]
fn foo(serial: &mut Serial1, buf: &'static mut [u8]) {
    // start a DMA transfer and forget the returned `Transfer` value
    mem::forget(serial.read_exact(buf));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
#}

As before, the DMA transfer continues after mem::forget-ing the Transfer value. This time that's not an issue because buf is statically allocated (e.g. static mut variable) and not on the stack.

Overlapping use

Our API doesn't prevent the user from using the Serial interface while the DMA transfer is in progress. This could lead the transfer to fail or data to be lost.

There are several ways to prevent overlapping use. One way is to have Transfer take ownership of Serial1 and return it back when wait is called.


# #![allow(unused_variables)]
#fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    buffer: B,
    // NOTE: added
    serial: Serial1,
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    // NOTE: the return value has changed
    pub fn wait(self) -> (B, Serial1) {
        // Busy wait until the transfer is done
        while !self.is_done() {}

        (self.buffer, self.serial)
    }

    // ..
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    // NOTE we now take `self` by value
    pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        // .. same as before ..

        Transfer {
            buffer,
            // NOTE: added
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    // NOTE we now take `self` by value
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        // .. same as before ..

        Transfer {
            buffer,
            // NOTE: added
            serial: self,
        }
    }
}
#}

The move semantics statically prevent access to Serial1 while the transfer is in progress.


# #![allow(unused_variables)]
#fn main() {
fn read(serial: Serial1, buf: &'static mut [u8; 16]) {
    let t = serial.read_exact(buf);

    // let byte = serial.read(); //~ ERROR: `serial` has been moved

    // .. do stuff ..

    let (serial, buf) = t.wait();

    // .. do more stuff ..
}
#}

There are other ways to prevent overlapping use. For example, a (Cell) flag that indicates whether a DMA transfer is in progress could be added to Serial1. When the flag is set read, write, read_exact and write_all would all return an error (e.g. Error::InUse) at runtime. The flag would be set when write_all / read_exact is used and cleared in Transfer.wait.

Compiler (mis)optimizations

The compiler is free to re-order and merge non-volatile memory operations to better optimize a program. With our current API, this freedom can lead to undefined behavior. Consider the following example:


# #![allow(unused_variables)]
#fn main() {
fn reorder(serial: Serial1, buf: &'static mut [u8]) {
    // zero the buffer (for no particular reason)
    buf.iter_mut().for_each(|byte| *byte = 0);

    let t = serial.read_exact(buf);

    // ... do other stuff ..

    let (buf, serial) = t.wait();

    buf.reverse();

    // .. do stuff with `buf` ..
}
#}

Here the compiler is free to move buf.reverse() before t.wait(), which would result in a data race: both the processor and the DMA would end up modifying buf at the same time. Similarly the compiler can move the zeroing operation to after read_exact, which would also result in a data race.

To prevent these problematic reorderings we can use a compiler_fence


# #![allow(unused_variables)]
#fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        self.dma.set_source_address(USART1_RX, false);
        self.dma
            .set_destination_address(buffer.as_mut_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        // NOTE: added
        atomic::compiler_fence(Ordering::Release);

        // NOTE: this is a volatile *write*
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        self.dma.set_destination_address(USART1_TX, false);
        self.dma.set_source_address(buffer.as_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        // NOTE: added
        atomic::compiler_fence(Ordering::Release);

        // NOTE: this is a volatile *write*
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(self) -> (B, Serial1) {
        // NOTE: this is a volatile *read*
        while !self.is_done() {}

        // NOTE: added
        atomic::compiler_fence(Ordering::Acquire);

        (self.buffer, self.serial)
    }

    // ..
}
#}

We use Ordering::Release in read_exact and write_all to prevent all preceding memory operations from being moved after self.dma.start(), which performs a volatile write.

Likewise, we use Ordering::Acquire in Transfer.wait to prevent all subsequent memory operations from being moved before self.is_done(), which performs a volatile read.

To better visualize the effect of the fences here's a slightly tweaked version of the example from the previous section. We have added the fences and their orderings in the comments.


# #![allow(unused_variables)]
#fn main() {
fn reorder(serial: Serial1, buf: &'static mut [u8], x: &mut u32) {
    // zero the buffer (for no particular reason)
    buf.iter_mut().for_each(|byte| *byte = 0);

    *x += 1;

    let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲

    // NOTE: the processor can't access `buf` between the fences
    // ... do other stuff ..
    *x += 2;

    let (buf, serial) = t.wait(); // compiler_fence(Ordering::Acquire) ▼

    *x += 3;

    buf.reverse();

    // .. do stuff with `buf` ..
}
#}

The zeroing operation can not be moved after read_exact due to the Release fence. Similarly, the reverse operation can not be moved before wait due to the Acquire fence. The memory operations between both fences can be freely reordered across the fences but none of those operations involves buf so such reorderings do not result in undefined behavior.

Note that compiler_fence is a bit stronger than what's required. For example, the fences will prevent the operations on x from being merged even though we know that buf doesn't overlap with x (due to Rust aliasing rules). However, there exist no intrinsic that's more fine grained than compiler_fence.

Don't we need a memory barrier?

That depends on the target architecture. In the case of Cortex M0 to M4F cores, AN321 says:

3.2 Typical usages

(..)

The use of DMB is rarely needed in Cortex-M processors because they do not reorder memory transactions. However, it is needed if the software is to be reused on other ARM processors, especially multi-master systems. For example:

  • DMA controller configuration. A barrier is required between a CPU memory access and a DMA operation.

(..)

4.18 Multi-master systems

(..)

Omitting the DMB or DSB instruction in the examples in Figure 41 on page 47 and Figure 42 would not cause any error because the Cortex-M processors:

  • do not re-order memory transfers
  • do not permit two write transfers to be overlapped.

Where Figure 41 shows a DMB (memory barrier) instruction being used before starting a DMA transaction.

In the case of Cortex-M7 cores you'll need memory barriers (DMB/DSB) if you are using the data cache (DCache), unless you manually invalidate the buffer used by the DMA.

If your target is a multi-core system then it's very likely that you'll need memory barriers.

If you do need the memory barrier then you need to use atomic::fence instead of compiler_fence. That should generate a DMB instruction on Cortex-M devices.

Generic buffer

Our API is more restrictive that it needs to be. For example, the following program won't be accepted even though it's valid.


# #![allow(unused_variables)]
#fn main() {
fn reuse(serial: Serial1, msg: &'static mut [u8]) {
    // send a message
    let t1 = serial.write_all(msg);

    // ..

    let (msg, serial) = t1.wait(); // `msg` is now `&'static [u8]`

    msg.reverse();

    // now send it in reverse
    let t2 = serial.write_all(msg);

    // ..

    let (buf, serial) = t2.wait();

    // ..
}
#}

To accept such program we can make the buffer argument generic.


# #![allow(unused_variables)]
#fn main() {
// as-slice = "0.1.0"
use as_slice::{AsMutSlice, AsSlice};

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: B) -> Transfer<B>
    where
        B: AsMutSlice<Element = u8>,
    {
        // NOTE: added
        let slice = buffer.as_mut_slice();
        let (ptr, len) = (slice.as_mut_ptr(), slice.len());

        self.dma.set_source_address(USART1_RX, false);

        // NOTE: tweaked
        self.dma.set_destination_address(ptr as usize, true);
        self.dma.set_transfer_length(len);

        atomic::compiler_fence(Ordering::Release);
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    fn write_all<B>(mut self, buffer: B) -> Transfer<B>
    where
        B: AsSlice<Element = u8>,
    {
        // NOTE: added
        let slice = buffer.as_slice();
        let (ptr, len) = (slice.as_ptr(), slice.len());

        self.dma.set_destination_address(USART1_TX, false);

        // NOTE: tweaked
        self.dma.set_source_address(ptr as usize, true);
        self.dma.set_transfer_length(len);

        atomic::compiler_fence(Ordering::Release);
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }
}

#}

NOTE: AsRef<[u8]> (AsMut<[u8]>) could have been used instead of AsSlice<Element = u8> (AsMutSlice<Element = u8).

Now the reuse program will be accepted.

Immovable buffers

With this modification the API will also accept arrays by value (e.g. [u8; 16]). However, using arrays can result in pointer invalidation. Consider the following program.


# #![allow(unused_variables)]
#fn main() {
fn invalidate(serial: Serial1) {
    let t = start(serial);

    bar();

    let (buf, serial) = t.wait();
}

#[inline(never)]
fn start(serial: Serial1) -> Transfer<[u8; 16]> {
    // array allocated in this frame
    let buffer = [0; 16];

    serial.read_exact(buffer)
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
#}

The read_exact operation will use the address of the buffer local to the start function. That local buffer will be freed when start returns and the pointer used in read_exact will become invalidated. You'll end up with a situation similar to the unsound example.

To avoid this problem we require that the buffer used with our API retains its memory location even when it's moved. The Pin newtype provides such guarantee. We can update our API to required that all buffers are "pinned" first.

NOTE: To compile all the programs below this point you'll need Rust >=1.33.0. As of time of writing (2019-01-04) that means using the nightly channel.


# #![allow(unused_variables)]
#fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    // NOTE: changed
    buffer: Pin<B>,
    serial: Serial1,
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: bounds changed
        B: DerefMut,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: bounds changed
        B: Deref,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..
    }
}
#}

NOTE: We could have used the StableDeref trait instead of the Pin newtype but opted for Pin since it's provided in the standard library.

With this new API we can use &'static mut references, Box-ed slices, Rc-ed slices, etc.


# #![allow(unused_variables)]
#fn main() {
fn static_mut(serial: Serial1, buf: &'static mut [u8]) {
    let buf = Pin::new(buf);

    let t = serial.read_exact(buf);

    // ..

    let (buf, serial) = t.wait();

    // ..
}

fn boxed(serial: Serial1, buf: Box<[u8]>) {
    let buf = Pin::new(buf);

    let t = serial.read_exact(buf);

    // ..

    let (buf, serial) = t.wait();

    // ..
}
#}

'static bound

Does pinning let us safely use stack allocated arrays? The answer is no. Consider the following example.


# #![allow(unused_variables)]
#fn main() {
fn unsound(serial: Serial1) {
    start(serial);

    bar();
}

// pin-utils = "0.1.0-alpha.4"
use pin_utils::pin_mut;

#[inline(never)]
fn start(serial: Serial1) {
    let buffer = [0; 16];

    // pin the `buffer` to this stack frame
    // `buffer` now has type `Pin<&mut [u8; 16]>`
    pin_mut!(buffer);

    mem::forget(serial.read_exact(buffer));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
#}

As seen many times before, the above program runs into undefined behavior due to stack frame corruption.

The API is unsound for buffers of type Pin<&'a mut [u8]> where 'a is not 'static. To prevent the problem we have to add a 'static bound in some places.


# #![allow(unused_variables)]
#fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: added 'static bound
        B: DerefMut + 'static,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: added 'static bound
        B: Deref + 'static,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..
    }
}
#}

Now the problematic program will be rejected.

Destructors

Now that the API accepts Box-es and other types that have destructors we need to decide what to do when Transfer is early-dropped.

Normally, Transfer values are consumed using the wait method but it's also possible to, implicitly or explicitly, drop the value before the transfer is over. For example, dropping a Transfer<Box<[u8]>> value will cause the buffer to be deallocated. This can result in undefined behavior if the transfer is still in progress as the DMA would end up writing to deallocated memory.

In such scenario one option is to make Transfer.drop stop the DMA transfer. The other option is to make Transfer.drop wait for the transfer to finish. We'll pick the former option as it's cheaper.


# #![allow(unused_variables)]
#fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    // NOTE: always `Some` variant
    inner: Option<Inner<B>>,
}

// NOTE: previously named `Transfer<B>`
struct Inner<B> {
    buffer: Pin<B>,
    serial: Serial1,
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(mut self) -> (Pin<B>, Serial1) {
        while !self.is_done() {}

        atomic::compiler_fence(Ordering::Acquire);

        let inner = self
            .inner
            .take()
            .unwrap_or_else(|| unsafe { hint::unreachable_unchecked() });
        (inner.buffer, inner.serial)
    }
}

impl<B> Drop for Transfer<B> {
    fn drop(&mut self) {
        if let Some(inner) = self.inner.as_mut() {
            // NOTE: this is a volatile write
            inner.serial.dma.stop();

            // we need a read here to make the Acquire fence effective
            // we do *not* need this if `dma.stop` does a RMW operation
            unsafe {
                ptr::read_volatile(&0);
            }

            // we need a fence here for the same reason we need one in `Transfer.wait`
            atomic::compiler_fence(Ordering::Acquire);
        }
    }
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        B: DerefMut + 'static,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..

        Transfer {
            inner: Some(Inner {
                buffer,
                serial: self,
            }),
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        B: Deref + 'static,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..

        Transfer {
            inner: Some(Inner {
                buffer,
                serial: self,
            }),
        }
    }
}
#}

Now the DMA transfer will be stopped before the buffer is deallocated.


# #![allow(unused_variables)]
#fn main() {
fn reuse(serial: Serial1) {
    let buf = Pin::new(Box::new([0; 16]));

    let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲

    // ..

    // this stops the DMA transfer and frees memory
    mem::drop(t); // compiler_fence(Ordering::Acquire) ▼

    // this likely reuses the previous memory allocation
    let mut buf = Box::new([0; 16]);

    // .. do stuff with `buf` ..
}
#}

Summary

To sum it up, we need to consider all the following points to achieve memory safe DMA transfers:

  • Use immovable buffers plus indirection: Pin<B>. Alternatively, you can use the StableDeref trait.

  • The ownership of the buffer must be passed to the DMA : B: 'static.

  • Do not rely on destructors running for memory safety. Consider what happens if mem::forget is used with your API.

  • Do add a custom destructor that stops the DMA transfer, or waits for it to finish. Consider what happens if mem::drop is used with your API.


This text leaves out up several details required to build a production grade DMA abstraction, like configuring the DMA channels (e.g. streams, circular vs one-shot mode, etc.), alignment of buffers, error handling, how to make the abstraction device-agnostic, etc. All those aspects are left as an exercise for the reader / community (:P).

Concurrency

This section discusses no_std concurrency as usually found on microcontrollers, and memory safe patterns for sharing memory with / between interrupt handlers. The focus of this text is on uses of unsafe code that are memory safety rather than building safe abstractions.

NOTE: Unlike other chapters, this text has been written assuming that the reader is not familiar with the interrupt mechanism commonly found in microcontrollers. The motivation is making this text accessible to more people who then can audit our unsafe code.

Interrupts

In bare metal systems, systems without an OS (operating system), usually the only form of concurrency available are hardware interrupts. An interrupt is a preemption mechanism that works as follows: when an interrupt signal arrives the processor suspends the execution of the current subroutine, (maybe) saves some registers (the current state of the program) to the stack and then jumps to another subroutine called the interrupt handler. When the processor returns from the interrupt handler, it restores the registers that it previously saved on the stack (if any) and then resumes the subroutine that was interrupted. (If you are familiar with POSIX signal handling, the semantics are pretty much the same)

Interrupt signals usually come from peripherals and are fired asynchronously. Some examples of interrupt signals are: a counter reaching zero, an input pin changing its electrical / logical state, and the arrival of a new byte of data. In some multi-core devices a core can send an interrupt signal to a different core.

How the processor locates the right interrupt handler to execute depends on the architecture. In the ARM Cortex-M architecture, there's one handler per interrupt signal and there's a table somewhere in memory that holds function pointers to all interrupt handlers. Each interrupt is given an index in this table. For example, a timer interrupt could be interrupt #0 and an input pin interrupt could be interrupt #1. If we were to depict this as Rust code it would look as follows:


# #![allow(unused_variables)]
#fn main() {
// `link_section` places this in some known memory location
#[link_section = ".interrupt_table"]
static INTERRUPT_TABLE: [extern "C" fn(); 32] = [
    // entry 0: timer 0
    on_timer0_interrupt,

    // entry 1: pin 0
    on_pin0_interrupt,

    // .. 30 more entries ..
];

// provided by the application author
extern "C" fn on_timer0_interrupt() {
    // ..
}

extern "C" fn on_pin0_interrupt() {
    // ..
}
#}

In another common interrupt model all interrupts signals map to the same interrupt handler (subroutine) and there's a hardware register that the software has to read when it enters the handler to figure out which interrupt signal triggered the interrupt. In this text, we'll focus on the ARM Cortex-M architecture which follows the one handler per interrupt signal model.

Interrupt handling API

The most basic interrupt handling API lets the programmer statically register a function for each interrupt handler only once. On top of this basic API it's possible to implement APIs to dynamically register closures as interrupt handlers. In this text we'll focus on the former, simpler API.

To illustrate this kind of API let's look at the cortex-m-rt crate (v0.6.7). It provides two attributes to statically register interrupts: #[exception] and #[interrupt]. The former is for device agnostic interrupts, whose number and names are the same for all Cortex-M devices; the latter is for device specific interrupts, whose number and names vary per device / vendor. We'll stick to the device agnostic interrupts ("exceptions") in our examples.

The following example showcases the system timer (SysTick) interrupt, which fires periodically. The interrupt is handled using the SysTick handler (function), which prints a dot to the console.

NOTE: The code for the following example and all other examples can be found in the ci/concurrency directory at the root of this repository.

// source: examples/systick.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use cortex_m::{asm, peripheral::syst::SystClkSource, Peripherals};
use cortex_m_rt::{entry, exception};
use cortex_m_semihosting::hprint;

// program entry point
#[entry]
fn main() -> ! {
    let mut syst = Peripherals::take().unwrap().SYST;

    // configures the system timer to trigger a SysTick interrupt every second
    syst.set_clock_source(SystClkSource::Core);
    syst.set_reload(12_000_000); // period = 1s
    syst.enable_counter();
    syst.enable_interrupt();

    loop {
        asm::nop();
    }
}

// interrupt handler
// NOTE: the function name must match the name of the interrupt
#[exception]
fn SysTick() {
    hprint!(".").unwrap();
}

If you are not familiar with embedded / Cortex-M programs the most important thing to point note here is that the function marked with the entry attribute is the entry point of the user program. When the device (re)boots (e.g. it's first powered) the "runtime" (the cortex-m-rt crate) initializes static variables (the content of RAM is random on power on) and then calls the user program entry point. As the user program is the only process running it is not allowed to end / exit; this is enforced in the signature of the entry function: fn() -> ! -- a divergent function can't return.

You can run this example on an x86 machine using QEMU. Make sure you have qemu-system-arm installed and run the following command

$ cargo run --example systick
(..)
     Running `qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel target/thumbv7m-none-eabi/debug/examples/systick`
.................

static variables: what is safe and what's not

As interrupt handlers have their own (call) stack they can't refer to (access) local variables in main or in functions called by main. The only way main and an interrupt handler can share state is through static variables, which have statically known addresses.

To really drive this point I find it useful to visualize the call stack of the program in the presence of interrupts. Consider the following example:

#[entry]
fn main() -> ! {

    loop {
        {
            let x = 42;
            foo();
        }

        {
            let w = 66;
            bar();
        }
    }
}

fn foo() {
   let y = 24;

   // ..
}

fn bar() {
    let z = 33;

    // ..

    foo();

    // ..
}

#[exception]
fn SysTick() {
    // can't access `x` or `y` because their addresses are not statically known
}

If we take snapshots of the call stack every time the SysTick interrupt handler is called we'll observe something like this:

                                                          +---------+
                                                          | SysTick |
                                                          |         |
            +---------+            +---------+            +#########+
            | SysTick |            | SysTick |            |   foo   |
            |         |            |         |            | y = 24  |
            +#########+            +#########+            +---------+
            |   foo   |            |   bar   |            |   bar   |
            | y = 24  |            | z = 33  |            | z = 33  |
            +---------+            +---------+            +---------+
            |   main  |            |   main  |            |   main  |
            | x = 42  |            | w = 66  |            | w = 66  |
            +---------+            +---------+            +---------+
              t = 1ms                t = 2ms                t = 3ms

From the call stack SysTick looks like a normal function since it's contiguous in memory to main and the functions called from it. However, that's not the case: SysTick is invoked asynchronously. At time t = 1ms SysTick could, in theory, access y since it's in the previous stack frame; however, at time t = 2ms y doesn't exist; and at time t = 3ms y exists but has a different location in memory (address).

I hope that explains why SysTick can't safely access the stack frames that belong to main.

Let's now go over all the unsafe and safe ways in which main and interrupt handlers can share state (memory). We'll start assuming the program will run on a single core device, then we'll revisit our safe patterns in the context of a multi-core device.

static mut

Unsynchronized access to static mut variables is undefined behavior (UB). The compiler will mis-optimize all those accesses.

Consider the following unsound program:

//! THIS PROGRAM IS UNSOUND!
// source: examples/static-mut.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use cortex_m::asm;
use cortex_m_rt::{entry, exception};

static mut X: u32 = 0;

#[inline(never)]
#[entry]
fn main() -> ! {
    // omitted: configuring and enabling the `SysTick` interrupt

    let x: &mut u32 = unsafe { &mut X };

    loop {
        *x = 0;

        // <~ preemption could occur here and change the value behind `x`

        if *x != 0 {
            // the compiler may optimize away this branch
            panic!();
        } else {
            asm::nop();
        }
    }
}

#[exception]
fn SysTick() {
    unsafe {
        X = 1;

        asm::nop();
    }
}

This program compiles: both main and SysTick can refer to the static variable X, which has a known, fixed location in memory. However, the program is mis-optimized to the following machine code:

00000400 <main>:
 400:   bf00            nop
 402:   e7fd            b.n     400 <main>

00000404 <SysTick>:
 404:   bf00            nop
 406:   4770            bx      lr

As you can see all accesses to X were optimized away changing the intended semantics.

Volatile

Using volatile operations to access static mut variables does not prevent UB. Volatile operations will prevent the compiler from mis-optimizing accesses to the variables but they don't help with torn reads and writes which lead to UB.

//! THIS PROGRAM IS UNSOUND!
// source: examples/volatile.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use core::ptr;

use cortex_m::asm;
use cortex_m_rt::{entry, exception};

#[repr(u64)]
enum Enum {
    A = 0x0000_0000_ffff_ffff,
    B = 0xffff_ffff_0000_0000,
}

static mut X: Enum = Enum::A;

#[entry]
fn main() -> ! {
    // omitted: configuring and enabling the `SysTick` interrupt

    loop {
        // this write operation is not atomic: it's performed in two moves
        unsafe { ptr::write_volatile(&mut X, Enum::A) } // <~ preemption

        unsafe { ptr::write_volatile(&mut X, Enum::B) }
    }
}

#[exception]
fn SysTick() {
    unsafe {
        // here we may observe `X` having the value `0x0000_0000_0000_0000`
        // or `0xffff_ffff_ffff_ffff` which are not valid `Enum` variants
        match X {
            Enum::A => asm::nop(),
            Enum::B => asm::bkpt(),
        }
    }
}

In this program the interrupt handler could preempt the 2-step write operation that changes X from variant A to variant B (or vice versa) mid way. If that happens the handler could observe X having the value 0x0000_0000_0000_0000 or 0xffff_ffff_ffff_ffff, neither of which are valid values for the enum.

Let me say that again: Relying only on volatile operations for memory safety is likely wrong. The only semantics that volatile operations provide are: "tell the compiler to not remove this operation, or merge it with another operation" and "tell the compiler to not reorder this operation with respect to other volatile operations"; neither is directly related to synchronized access to memory.

Atomics

Accessing atomics stored in static variables is memory safe. If you are building abstractions like channels on top of them (which likely will require unsafe code to access some shared buffer) make sure you use the right Ordering or your abstraction will be unsound.

Here's an example of using a static variable for synchronization (a delay in this case).

NOTE: not all embedded targets have atomic CAS instructions in their ISA. MSP430 and ARMv6-M are prime examples. API like AtomicUsize.fetch_add is not available in core for those targets.

static X: AtomicBool = AtomicBool::new(false);

#[entry]
fn main() -> ! {
    // omitted: configuring and enabling the `SysTick` interrupt

    // wait until `SysTick` returns before starting the main logic
    while !X.load(Ordering::Relaxed) {}

    loop {
        // main logic
    }
}

#[exception]
fn SysTick() {
    X.store(true, Ordering::Relaxed);
}

State and re-entrancy

A common pattern in embedded C is to use a static variable to preserve state between invocations of an interrupt handler.

void handler() {
    static int counter = 0;

    counter += 1;

    // ..
}

This makes the function non-reentrant, meaning that calling this function from itself, from main or an interrupt handler is UB (it breaks mutable aliasing rules).

We can make this C pattern safe in Rust if we make the non-reentrant function unsafe to call or impossible to call. cortex-m-rt v0.5.x supports this pattern and uses the latter approach to prevent calling non-reentrant functions from safe code.

Consider this example:

// source: examples/state.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use cortex_m::asm;
use cortex_m_rt::{entry, exception};

#[inline(never)]
#[entry]
fn main() -> ! {
    loop {
        // SysTick(); //~ ERROR: cannot find function `SysTick` in this scope

        asm::nop();
    }
}

#[exception]
fn SysTick() {
    static mut COUNTER: u64 = 0;

    // user code
    *COUNTER += 1;

    // SysTick(); //~ ERROR: cannot find function `SysTick` in this scope
}

The #[exception] attribute performs the following source-level transformation:


# #![allow(unused_variables)]
#fn main() {
#[link_name = "SysTick"] // places this function in the vector table
fn randomly_generated_identifier() {
    let COUNTER: &mut u64 = unsafe {
        static mut COUNTER: u64 = 0;

        &mut COUNTER
    };

    // user code
    *COUNTER += 1;

    // ..
}
#}

Placing the static mut variable inside a block makes it impossible to create more references to it from user code.

This transformation ensures that the software can't call the interrupt handler from safe code, but could the hardware invoke the interrupt handler in a way that breaks memory safety? The answer is: it depends, on the target architecture.

In the ARM Cortex-M architecture once an instance of an interrupt handler starts another one won't start until the first one ends (if the same interrupt signal arrives again it is withheld). On the other hand, in the ARM Cortex-R architecture there's a single handler for all interrupts; receiving two different interrupt signals can cause the handler (function) to be invoked twice and that would break the memory safety of the source level transformation we presented above.

Critical sections

When it's necessary to share state between main and an interrupt handler a critical section can be used to synchronize access. The simplest critical section implementation consists of temporarily disabling all interrupts while main accesses the shared static variable. Example below:

// source: examples/cs1.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use cortex_m::interrupt;
use cortex_m_rt::{entry, exception};

static mut COUNTER: u64 = 0;

#[inline(never)]
#[entry]
fn main() -> ! {
    loop {
        // `SysTick` can preempt `main` at this point

        // start of critical section: disable interrupts
        interrupt::disable(); // = `asm!("CPSID I" : : : "memory" : "volatile")`
                              //                         ^^^^^^^^

        // `SysTick` can not preempt this block
        {
            let counter: &mut u64 = unsafe { &mut COUNTER };

            *counter += 1;
        }

        // end of critical section: re-enable interrupts
        unsafe { interrupt::enable() }
        //^= `asm!("CPSIE I" : : : "memory" : "volatile")`
        //                         ^^^^^^^^

        // `SysTick` can start at this point
    }
}

#[exception]
fn SysTick() {
    // exclusive access to `COUNTER`
    let counter: &mut u64 = unsafe { &mut COUNTER };

    *counter += 1;
}

Note the use of the "memory" clobber; this acts as a compiler barrier that prevents the compiler from reordering the operation on COUNTER to outside the critical section. It's also important to not access COUNTER in main outside a critical section; thus references to COUNTER should not escape the critical section. With these two restrictions in place, the mutable reference to COUNTER created in SysTick is guaranteed to be unique for the whole execution of the handler.

Disabling all the interrupt is not the only way to create a critical section; other ways include masking interrupts (disabling one or a subset of all interrupts) and increasing the running priority (see next section).

Masking interrupts to create a critical section deserves an example because it doesn't use inline asm! and thus requires explicit compiler barriers (atomic::compiler_fence) for memory safety.

// source: examples/cs2.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use core::sync::atomic::{self, Ordering};

use cortex_m_rt::{entry, exception};

static mut COUNTER: u64 = 0;

#[inline(never)]
#[entry]
fn main() -> ! {
    let mut syst = cortex_m::Peripherals::take().unwrap().SYST;

    // omitted: configuring and enabling the `SysTick` interrupt

    loop {
        // `SysTick` can preempt `main` at this point

        // start of critical section: disable the `SysTick` interrupt
        syst.disable_interrupt();
        // ^ this method is implemented as shown in the comment below
        //
        // ```
        // let csr = ptr::read_volatile(0xE000_E010);`
        // ptr::write_volatile(0xE000_E010, csr & !(1 << 1));
        // ```

        // a compiler barrier equivalent to the "memory" clobber
        atomic::compiler_fence(Ordering::SeqCst);

        // `SysTick` can not preempt this block
        {
            let counter: &mut u64 = unsafe { &mut COUNTER };

            *counter += 1;
        }

        atomic::compiler_fence(Ordering::SeqCst);

        // end of critical section: re-enable the `SysTick` interrupt
        syst.enable_interrupt();
        // ^ this method is implemented as shown in the comment below
        //
        // ```
        // let csr = ptr::read_volatile(0xE000_E010);`
        // ptr::write_volatile(0xE000_E010, csr | (1 << 1));
        // ```

        // `SysTick` can start at this point
    }
}

#[exception]
fn SysTick() {
    // exclusive access to `COUNTER`
    let counter: &mut u64 = unsafe { &mut COUNTER };

    *counter += 1;
}

The code is very similar to the one that disabled all interrupts except for the start and end of the critical section, which now include a compiler_fence (compiler barrier).

Priorities

Architectures like ARM Cortex-M allow interrupt prioritization, meaning that an interrupt that's given high priority can preempt a lower priority interrupt handler. Priorities must be considered when sharing state between interrupt handlers.

When two interrupt handlers, say A and B, have the same priority no preemption can occur. Meaning that when signals for both interrupts arrive around the same time then the handlers will be executed sequentially: that is first A and then B, or vice versa. In this scenario, both handlers can access the same static mut variable without using a critical section; each handler will "take turns" at getting exclusive access (&mut-) to the static variable. Example below.

// source: examples/coop.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use cortex_m::asm;
use cortex_m_rt::{entry, exception};

// priority = 0 (lowest)
#[inline(never)]
#[entry]
fn main() -> ! {
    // omitted: enabling interrupts and setting their priorities

    loop {
        asm::nop();
    }
}

static mut COUNTER: u64 = 0;

// priority = 1
#[exception]
fn SysTick() {
    // exclusive access to `COUNTER`
    let counter: &mut u64 = unsafe { &mut COUNTER };

    *counter += 1;
}

// priority = 1
#[exception]
fn SVCall() {
    // exclusive access to `COUNTER`
    let counter: &mut u64 = unsafe { &mut COUNTER };

    *counter *= 2;
}

When two interrupt handlers have different priorities then one can preempt the other. Safely sharing state between these two interrupts requires a critical section in the lower priority handler -- just like in the case of main and an interrupt handler. However, one more constraint is required: the priority of the interrupts must remain fixed at runtime; reversing the priorities at runtime, for example, would result in a data race.

The following example showcases safe state sharing between two interrupt handlers using a priority-based critical section.

// source: examples/cs3.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use cortex_m::{asm, register::basepri};
use cortex_m_rt::{entry, exception};

// priority = 0 (lowest)
#[inline(never)]
#[entry]
fn main() -> ! {
    // omitted: enabling interrupts and setting up their priorities

    loop {
        asm::nop();
    }
}

static mut COUNTER: u64 = 0;

// priority = 2
#[exception]
fn SysTick() {
    // exclusive access to `COUNTER`
    let counter: &mut u64 = unsafe { &mut COUNTER };

    *counter += 1;
}

// priority = 1
#[exception]
fn SVCall() {
    // `SysTick` can preempt `SVCall` at this point

    // start of critical section: raise the running priority to 2
    raise(2);

    // `SysTick` can *not* preempt this block because it has a priority of 2 (equal)
    // `PendSV` *can* preempt this block because it has a priority of 3 (higher)
    {
        // exclusive access to `COUNTER`
        let counter: &mut u64 = unsafe { &mut COUNTER };

        *counter *= 2;
    }

    // start of critical section: lower the running priority to its original value
    unsafe { lower() }

    // `SysTick` can preempt `SVCall` again
}

// priority = 3
#[exception]
fn PendSV() {
    // .. does not access `COUNTER` ..
}

fn raise(priority: u8) {
    const PRIO_BITS: u8 = 3;

    // (priority is encoded in hardware in the higher order bits of a byte)
    // (also in this encoding a bigger number means lower priority)
    let p = ((1 << PRIO_BITS) - priority) << (8 - PRIO_BITS);

    unsafe { basepri::write(p) }
    //^= `asm!("MSR BASEPRI, $0" : "=r"(p) : : "memory" : "volatile")`
    //                                         ^^^^^^^^
}

unsafe fn lower() {
    basepri::write(0)
}

Runtime initialization

A common need in embedded Rust programs is moving, at runtime, a value from main into an interrupt handler. This can be accomplished at zero cost by enforcing sequential access to static mut variables.

// source: examples/init.rs

#![feature(maybe_uninit)]
#![no_main]
#![no_std]

extern crate panic_halt;

use core::mem::MaybeUninit;

use cortex_m::{asm, interrupt};
use cortex_m_rt::{entry, exception};

struct Thing {
    _state: (),
}

impl Thing {
    // NOTE the constructor is not `const`
    fn new() -> Self {
        Thing { _state: () }
    }

    fn do_stuff(&mut self) {
        // ..
    }
}

// uninitialized static variable
static mut THING: MaybeUninit<Thing> = MaybeUninit::uninitialized();

#[entry]
fn main() -> ! {
    // # Initialization phase

    // done as soon as the device boots
    interrupt::disable();

    // critical section that can't be preempted by any interrupt
    {
        // initialize the static variable at runtime
        unsafe { THING.set(Thing::new()) };

        // omitted: configuring and enabling the `SysTick` interrupt
    }

    // reminder: this is a compiler barrier
    unsafe { interrupt::enable() }

    // # main loop

    // `SysTick` can preempt `main` at this point

    loop {
        asm::nop();
    }
}

#[exception]
fn SysTick() {
    // this handler always observes the variable as initialized
    let thing: &mut Thing = unsafe { &mut *THING.as_mut_ptr() };

    thing.do_stuff();
}

In this pattern is important to disable interrupts before yielding control to the user program and enforcing that the end user initializes all the uninitialized static variables before interrupts are re-enabled. Failure to do so would result in interrupt handlers observing uninitialized static variables.

Redefining Send and Sync

The core / standard library defines these two marker traits as:

Sync: types for which it is safe to share references between threads.

Send: types that can be transferred across thread boundaries

Threads are an OS abstraction so they don't exist "out of the box" in bare metal context, though they can be implemented on top of interrupts. We'll broaden the definition of these two marker traits to include bare metal code:

  • Sync: types for which it is safe to share references between execution contexts.

  • Send: types that can be transferred between execution contexts.

An interrupt handler is an execution context independent of the main function, which can be seen as the "bottom" execution context. An OS thread is also an execution context. Each execution context has its own (call) stack and operates independently of other execution contexts though they can share state.

Broadening the definitions of these marker traits does not change the rules around static variables. They must still hold values that implement the Sync trait. Atomics implement Sync so they are valid to place in static variables in bare metal context.

Let's now revisit the safe patterns we described before and see where the Sync and Send bounds need to be enforced for safety.

State


# #![allow(unused_variables)]
#fn main() {
#[exception]
fn SysTick() {
    static mut X: Type = Type::new();
}
#}

Does Type need to satisfy Sync or Send? X is effectively owned by the SysTick interrupt and not shared with any other execution context so neither bound is required for this pattern.

Critical section

We can abstract the "disable all interrupts" critical section pattern into a Mutex type.

// source: examples/mutex.rs

#![no_main]
#![no_std]

extern crate panic_halt;

use core::cell::{RefCell, UnsafeCell};

use bare_metal::CriticalSection;
use cortex_m::interrupt;
use cortex_m_rt::{entry, exception};

struct Mutex<T>(UnsafeCell<T>);

// TODO does T require a Sync / Send bound?
unsafe impl<T> Sync for Mutex<T> {}

impl<T> Mutex<T> {
    const fn new(value: T) -> Mutex<T> {
        Mutex(UnsafeCell::new(value))
    }

    // NOTE: the `'cs` constraint prevents the returned reference from outliving
    // the `CriticalSection` token
    fn borrow<'cs>(&self, _cs: &'cs CriticalSection) -> &'cs T {
        unsafe { &*self.0.get() }
    }
}

static COUNTER: Mutex<RefCell<u64>> = Mutex::new(RefCell::new(0));

#[inline(never)]
#[entry]
fn main() -> ! {
    loop {
        // `interrupt::free` runs the closure in a critical section (interrupts disabled)
        interrupt::free(|cs: &CriticalSection| {
            let counter: &RefCell<u64> = COUNTER.borrow(cs);

            *counter.borrow_mut() += 1;

            // &*counter.borrow() //~ ERROR: this reference cannot outlive the closure
        });
    }
}

#[exception]
fn SysTick() {
    interrupt::free(|cs| {
        let counter = COUNTER.borrow(cs);
        *counter.borrow_mut() *= 2;
    });
}

Here we use a CriticalSection token to prevent references escaping the critical section / closure (see the lifetime constraints in Mutex.borrow).

It's important to note that a Mutex.borrow_mut method with no additional runtime checks would be unsound as it would let the end user break Rust aliasing rules:


# #![allow(unused_variables)]
#fn main() {
#[exception]
fn SysTick() {
    interrupt::free(|cs| {
        // both `counter` and `alias` refer to the same memory location
        let counter: &mut u64 = COUNTER.borrow_mut(cs);
        let alias: &mut u64 = COUNTER.borrow_mut(cs);
    });
}
#}

Changing the signature of borrow_mut to fn<'cs>(&self, &'cs mut CriticalSection) -> &'cs mut T does not help because it's possible to nest calls to interrupt::free.


# #![allow(unused_variables)]
#fn main() {
#[exception]
fn SysTick() {
    interrupt::free(|cs: &mut CriticalSection| {
        let counter: &mut u64 = COUNTER.borrow_mut(cs);

        // let alias: &mut u64 = COUNTER.borrow_mut(cs);
        //~^ ERROR: `cs` already mutably borrowed

        interrupt::free(|cs2: &mut CriticalSection| {
            // this breaks aliasing rules
            let alias: &mut u64 = COUNTER.borrow_mut(cs2);
        });
    });
}
#}

As for the bounds required on the value of type T protected by the Mutex: T must implement the Send trait because a Mutex can be used as a channel to move values from main to an interrupt handler. See below:

struct Thing {
    _state: (),
}

static CHANNEL: Mutex<RefCell<Option<Thing>>> = Mutex::new(RefCell::new(None));

#[entry]
fn main() -> ! {
    interrupt::free(|cs| {
        let channel = CHANNEL.borrow(cs);

        *channel.borrow_mut() = Some(Thing::new());
    });

    loop {
        asm::nop();
    }
}

#[exception]
fn SysTick() {
    interrupt::free(|cs| {
        let channel = CHANNEL.borrow(cs);
        let maybe_thing = channel.borrow_mut().take();
        if let Some(thing) = mabye_thing {
            // `thing` has been moved into the interrupt handler
        }
    });
}

So the Sync implementation must look like this:


# #![allow(unused_variables)]
#fn main() {
unsafe impl<T> Sync for Mutex<T> where T: Send {}
#}

This constraint applies to all types of critical sections.

Runtime initialization

For the pattern of moving values from main to an interrupt handler this is clearly a "send" operation so the moved value must implement the Send trait. We won't give an example of an abstraction for that pattern in this text but any such abstraction must enforce at compile time that values to be moved implement the Send trait.

Multi-core

So far we have discussed single core devices. Let's see how having multiple cores affects the memory safety of the abstractions and patterns we have covered.

Mutex: !Sync

The Mutex abstraction we created and that disables interrupts to create a critical section is unsound in multi-core context. The reason is that the critical section doesn't prevent other cores from making progress so if more than one core gets a reference to the data behind the Mutex all accesses become data races.

Here an example where we assume a dual-core device and a framework that lets you write bare-metal multi-core in a single source file.

// THIS PROGRAM IS UNSOUND!

// single memory location visible to both cores
static COUNTER: Mutex<Cell<u64>> = Mutex::new(Cell::new(0));

// runs on the first core
#[core(0)]
#[entry]
fn main() -> ! {
    loop {
        interrupt::free(|cs| {
            let counter = COUNTER.borrow(cs);

            counter.set(counter.get() + 1);
        });
    }
}

// runs on the second core
#[core(1)]
#[entry]
fn main() -> ! {
    loop {
        interrupt::free(|cs| {
            let counter = COUNTER.borrow(cs);

            counter.set(counter.get() * 2);
        });
    }
}

Here each core accesses the COUNTER variable in their main context in an unsynchronized manner; this is undefined behavior.

The problem with Mutex is not the critical section that uses; it's the fact that it can be stored in a static variable making accessible to all cores. Thus in multi-core context the Mutex abstraction should not implement the Sync trait.

Critical sections based on interrupt masking can be used safely on architectures / devices where it's possible to assign a single core to an interrupt and any core can mask that interrupt, provided that scoping is enforced somehow. Here's an example:


# #![allow(unused_variables)]
#fn main() {
static mut COUNTER: u64 = 0;

// runs on the first core
// priority = 2
#[core(0)]
#[exception]
fn SysTick() {
    // exclusive access to `COUNTER`
    let counter: &mut u64 = unsafe { &mut COUNTER };

    *counte += 1;
}

// initialized in the second core's `main` function using the runtime
// initialization pattern
static mut SYST: MaybeUninit<SYST> = MaybeUninit::ununitialized();

// runs on the second core
// priority = 1
#[core(1)]
#[exception]
fn SVCall() {
    // `SYST` is owned by this core / interrupt
    let syst = unsafe { &mut *SYST.as_mut_ptr() };

    // start of critical section: disable the `SysTick` interrupt
    syst.disable_interrupt();

    atomic::compiler_fence(Ordering::SeqCst);

    // `SysTick` can not preempt this block
    {
        let counter: &mut u64 = unsafe { &mut COUNTER };

        *counter += 1;
    }

    atomic::compiler_fence(Ordering::SeqCst);

    // end of critical section: re-enable the `SysTick` interrupt
    syst.enable_interrupt();
}
#}

Atomics

Atomics are safe to use in multi-core context provided that memory barrier instructions are inserted where appropriate. If you are using the correct Ordering then the compiler will insert the required barriers for you. Critical sections based on atomics, AKA spinlocks, are memory safe to use on multi-core devices though they can deadlock.

// spin = "0.5.0"
use spin::Mutex;

static COUNTER: Mutex<u64> = Mutex::new(0);

// runs on the first core
#[core(0)]
#[entry]
fn main() -> ! {
    loop {
        *COUNTER.lock() += 1;
    }
}

// runs on the second core
#[core(1)]
#[entry]
fn main() -> ! {
    loop {
        *COUNTER.lock() *= 2;
    }
}

State

The stateful interrupt handler pattern remains safe if and only if the target architecture / device supports assigning a handler to a single core and the program has been configured to not share stateful interrupts between cores -- that is cores should not execute the exact same handler when the corresponding signal arrives.

Runtime initialization

As the runtime initialization pattern is used to initialize the "state" of interrupt handlers so all the additional constraints required for multi-core memory safety of the State pattern are also required here.

A note on compiler support

This book makes use of a built-in compiler target, the thumbv7m-none-eabi, for which the Rust team distributes a rust-std component, which is a pre-compiled collection of crates like core and std.

If you want to attempt replicating the contents of this book for a different target architecture, you need to take into account the different levels of support that Rust provides for (compilation) targets.

LLVM support

As of Rust 1.28, the official Rust compiler, rustc, uses LLVM for (machine) code generation. The minimal level of support Rust provides for an architecture is having its LLVM backend enabled in rustc. You can see all the architectures that rustc supports, through LLVM, by running the following command:

$ # you need to have `cargo-binutils` installed to run this command
$ cargo objdump -- -version
LLVM (http://llvm.org/):
  LLVM version 7.0.0svn
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_be - AArch64 (big endian)
    arm        - ARM
    arm64      - ARM64 (little endian)
    armeb      - ARM (big endian)
    hexagon    - Hexagon
    mips       - Mips
    mips64     - Mips64 [experimental]
    mips64el   - Mips64el [experimental]
    mipsel     - Mipsel
    msp430     - MSP430 [experimental]
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    sparc      - Sparc
    sparcel    - Sparc LE
    sparcv9    - Sparc V9
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64

If LLVM supports the architecture you are interested in, but rustc is built with the backend disabled (which is the case of AVR as of Rust 1.28), then you will need to modify the Rust source enabling it. The first two commits of PR rust-lang/rust#52787 give you an idea of the required changes.

On the other hand, if LLVM doesn't support the architecture, but a fork of LLVM does, you will have to replace the original version of LLVM with the fork before building rustc. The Rust build system allows this and in principle it should just require changing the llvm submodule to point to the fork.

If your target architecture is only supported by some vendor provided GCC, you have the option of using mrustc, an unofficial Rust compiler, to translate your Rust program into C code and then compile that using GCC.

Built-in target

A compilation target is more than just its architecture. Each target has a specification associated to it that describes, among other things, its architecture, its operating system and the default linker.

The Rust compiler knows about several targets. These are said to be built into the compiler and can be listed by running the following command:

$ rustc --print target-list | column
aarch64-fuchsia                 mips64el-unknown-linux-gnuabi64
aarch64-linux-android           mipsel-unknown-linux-gnu
aarch64-unknown-cloudabi        mipsel-unknown-linux-musl
aarch64-unknown-freebsd         mipsel-unknown-linux-uclibc
aarch64-unknown-linux-gnu       msp430-none-elf
aarch64-unknown-linux-musl      powerpc-unknown-linux-gnu
aarch64-unknown-openbsd         powerpc-unknown-linux-gnuspe
arm-linux-androideabi           powerpc-unknown-netbsd
arm-unknown-linux-gnueabi       powerpc64-unknown-linux-gnu
arm-unknown-linux-gnueabihf     powerpc64le-unknown-linux-gnu
arm-unknown-linux-musleabi      powerpc64le-unknown-linux-musl
arm-unknown-linux-musleabihf    s390x-unknown-linux-gnu
armebv7r-none-eabihf            sparc-unknown-linux-gnu
armv4t-unknown-linux-gnueabi    sparc64-unknown-linux-gnu
armv5te-unknown-linux-gnueabi   sparc64-unknown-netbsd
armv5te-unknown-linux-musleabi  sparcv9-sun-solaris
armv6-unknown-netbsd-eabihf     thumbv6m-none-eabi
armv7-linux-androideabi         thumbv7em-none-eabi
armv7-unknown-cloudabi-eabihf   thumbv7em-none-eabihf
armv7-unknown-linux-gnueabihf   thumbv7m-none-eabi
armv7-unknown-linux-musleabihf  wasm32-experimental-emscripten
armv7-unknown-netbsd-eabihf     wasm32-unknown-emscripten
asmjs-unknown-emscripten        wasm32-unknown-unknown
i586-pc-windows-msvc            x86_64-apple-darwin
i586-unknown-linux-gnu          x86_64-fuchsia
i586-unknown-linux-musl         x86_64-linux-android
i686-apple-darwin               x86_64-pc-windows-gnu
i686-linux-android              x86_64-pc-windows-msvc
i686-pc-windows-gnu             x86_64-rumprun-netbsd
i686-pc-windows-msvc            x86_64-sun-solaris
i686-unknown-cloudabi           x86_64-unknown-bitrig
i686-unknown-dragonfly          x86_64-unknown-cloudabi
i686-unknown-freebsd            x86_64-unknown-dragonfly
i686-unknown-haiku              x86_64-unknown-freebsd
i686-unknown-linux-gnu          x86_64-unknown-haiku
i686-unknown-linux-musl         x86_64-unknown-l4re-uclibc
i686-unknown-netbsd             x86_64-unknown-linux-gnu
i686-unknown-openbsd            x86_64-unknown-linux-gnux32
mips-unknown-linux-gnu          x86_64-unknown-linux-musl
mips-unknown-linux-musl         x86_64-unknown-netbsd
mips-unknown-linux-uclibc       x86_64-unknown-openbsd
mips64-unknown-linux-gnuabi64   x86_64-unknown-redox

You can print the specification of any of these targets using the following command:

$ rustc +nightly -Z unstable-options --print target-spec-json --target thumbv7m-none-eabi
{
  "abi-blacklist": [
    "stdcall",
    "fastcall",
    "vectorcall",
    "thiscall",
    "win64",
    "sysv64"
  ],
  "arch": "arm",
  "data-layout": "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64",
  "emit-debug-gdb-scripts": false,
  "env": "",
  "executables": true,
  "is-builtin": true,
  "linker": "arm-none-eabi-gcc",
  "linker-flavor": "gcc",
  "llvm-target": "thumbv7m-none-eabi",
  "max-atomic-width": 32,
  "os": "none",
  "panic-strategy": "abort",
  "relocation-model": "static",
  "target-c-int-width": "32",
  "target-endian": "little",
  "target-pointer-width": "32",
  "vendor": ""
}

If none of these built-in targets seems appropriate for your target system, you'll have to create a custom target by writing your own target specification file in JSON format. The recommended way is to dump the specification of a built-in target that's similar to your target system into a file and then tweak it to match the properties of your target system. To do so, use the previously shown command, rustc --print target-spec-json. As of Rust 1.28, there's no up to date documentation on what each of the fields of a target specification mean, other than the compiler source code.

Once you have a target specification file you can refer to it by its path or by its name if its in the current directory or in $RUST_TARGET_PATH.

$ rustc +nightly -Z unstable-options --print target-spec-json \
      --target thumbv7m-none-eabi \
      > foo.json

$ rustc --print cfg --target foo.json # or just --target foo
debug_assertions
target_arch="arm"
target_endian="little"
target_env=""
target_feature="mclass"
target_feature="v7"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="8"
target_has_atomic="cas"
target_has_atomic="ptr"
target_os="none"
target_pointer_width="32"
target_vendor=""

rust-std component

For some of the built-in target the Rust team distributes rust-std components via rustup. This component is a collection of pre-compiled crates like core and std, and it's required for cross compilation.

You can find the list of targets that have a rust-std component available via rustup by running the following command:

$ rustup target list | column
aarch64-apple-ios                       mips64-unknown-linux-gnuabi64
aarch64-linux-android                   mips64el-unknown-linux-gnuabi64
aarch64-unknown-fuchsia                 mipsel-unknown-linux-gnu
aarch64-unknown-linux-gnu               mipsel-unknown-linux-musl
aarch64-unknown-linux-musl              powerpc-unknown-linux-gnu
arm-linux-androideabi                   powerpc64-unknown-linux-gnu
arm-unknown-linux-gnueabi               powerpc64le-unknown-linux-gnu
arm-unknown-linux-gnueabihf             s390x-unknown-linux-gnu
arm-unknown-linux-musleabi              sparc64-unknown-linux-gnu
arm-unknown-linux-musleabihf            sparcv9-sun-solaris
armv5te-unknown-linux-gnueabi           thumbv6m-none-eabi
armv5te-unknown-linux-musleabi          thumbv7em-none-eabi
armv7-apple-ios                         thumbv7em-none-eabihf
armv7-linux-androideabi                 thumbv7m-none-eabi
armv7-unknown-linux-gnueabihf           wasm32-unknown-emscripten
armv7-unknown-linux-musleabihf          wasm32-unknown-unknown
armv7s-apple-ios                        x86_64-apple-darwin
asmjs-unknown-emscripten                x86_64-apple-ios
i386-apple-ios                          x86_64-linux-android
i586-pc-windows-msvc                    x86_64-pc-windows-gnu
i586-unknown-linux-gnu                  x86_64-pc-windows-msvc
i586-unknown-linux-musl                 x86_64-rumprun-netbsd
i686-apple-darwin                       x86_64-sun-solaris
i686-linux-android                      x86_64-unknown-cloudabi
i686-pc-windows-gnu                     x86_64-unknown-freebsd
i686-pc-windows-msvc                    x86_64-unknown-fuchsia
i686-unknown-freebsd                    x86_64-unknown-linux-gnu (default)
i686-unknown-linux-gnu                  x86_64-unknown-linux-gnux32
i686-unknown-linux-musl                 x86_64-unknown-linux-musl
mips-unknown-linux-gnu                  x86_64-unknown-netbsd
mips-unknown-linux-musl                 x86_64-unknown-redox

If there's no rust-std component for your target or you are using a custom target, then you'll have to use a tool like Xargo to have Cargo compile the core crate on the fly. Note that Xargo requires a nightly toolchain; the long term plan is to upstream Xargo's functionality into Cargo and eventually have that functionality available on stable.