Embedded Async Programming: A New Embedded Programming Paradigm

Traditional embedded development is mainly based on bare-metal polling and RTOS task models, but both models have their respective limitations.

Bare-Metal Polling Model

Bare-metal polling is the simplest programming model in embedded systems, based on an infinite loop (commonly called the "main loop"), where various states and events are checked sequentially in the loop and corresponding processing is performed. This approach is simple and straightforward, doesn't depend on any operating system, and is suitable for simple tasks.

A typical bare-metal application often combines polling with interrupts:

// Global variables for interrupt and main loop communication
volatile uint32_t tick_counter = 0;
volatile bool button_pressed = false;

// Timer interrupt service function
void TIM2_IRQHandler(void) {
    if (TIM2->SR & TIM_SR_UIF) {  // Check update interrupt flag
        tick_counter++;           // Increment counter
        TIM2->SR &= ~TIM_SR_UIF;  // Clear interrupt flag
    }
}

// External interrupt service function
void EXTI0_IRQHandler(void) {
    if (EXTI->PR & EXTI_PR_PR0) {
        button_pressed = true;    // Set button pressed flag
        EXTI->PR = EXTI_PR_PR0;   // Clear interrupt flag
    }
}

int main(void) {
    // System initialization
    SystemInit();
    // Configure timer interrupt and callback
    ......
    uint32_t last_tick = 0;
    // Main loop 
    while (1) {
        // Check for button press events
        if (button_pressed) {
            // Handle button event
            LED_Toggle();
            button_pressed = false;  // Clear flag
        }
        // Timer-based periodic tasks
        if (tick_counter - last_tick >= 1000) {  // About every second
            last_tick = tick_counter;
            // Execute periodic tasks
            UpdateSensors();
            UpdateDisplay();
        }
        // Can add more tasks...
    }
}

Although simple and understandable, the bare-metal polling model is often difficult to handle when dealing with complex tasks, has low scalability, and is difficult to handle complex task scheduling. Communication between interrupts and main loop relies on global variables, which easily leads to race conditions and data consistency problems. Rust implementation of bare-metal loops has similar effects but can add more safety features like Atomic.

RTOS Task Model

The RTOS task model is a commonly used programming model in embedded systems that divides system tasks into multiple independent tasks, each running in its own context and scheduled through a task scheduler.

Here we use FreeRTOS as an example. FreeRTOS is an open-source real-time operating system widely used in embedded systems. It provides a task scheduler that manages multiple tasks, each running in its own context and scheduled through the task scheduler.

// FreeRTOS task definition
void vSensorTask(void *pvParameters) {
    while(1) {
        // Wait for sensor data
        if(xSemaphoreTake(sensorDataSemaphore, portMAX_DELAY) == pdTRUE) {
            readAndProcessSensorData();
        }
    }
}

void vUartTask(void *pvParameters) {
    while(1) {
        // Wait for UART data
        if(xSemaphoreTake(uartDataSemaphore, portMAX_DELAY) == pdTRUE) {
            handleUartData();
        }
    }
}

// Task creation
void main() {
    // Create semaphores
    sensorDataSemaphore = xSemaphoreCreateBinary();
    uartDataSemaphore = xSemaphoreCreateBinary();
    // Create tasks
    xTaskCreate(vSensorTask, "Sensor", STACK_SIZE, NULL, PRIORITY_NORMAL, NULL);
    xTaskCreate(vUartTask, "UART", STACK_SIZE, NULL, PRIORITY_HIGH, NULL);
    // Start scheduler
    vTaskStartScheduler();
}

This task model allows us to perform task scheduling more conveniently, and many projects and products are built based on this model. However, RTOS task switching requires saving and restoring context, which affects performance if task switching is frequent. Each task also needs separate stack space, which occupies considerable memory if there are many tasks.

Embedded Async Programming

What is Async Programming

Async programming is a concurrency model that allows tasks to yield execution rights while waiting for I/O or other operations to complete rather than blocking threads or creating new threads. The core idea is:

Non-blocking execution: Operations return immediately after starting, without waiting for completion
Callback or await mechanism: Notify or resume execution when operations complete
Efficient resource utilization: Schedule tasks through state machines rather than thread switching

Synchronous (blocking) operation:

// Synchronous read
result = read_sensor();  // Thread blocks until sensor returns data
process(result);         // Process after read completion

Asynchronous (non-blocking) operation:

// Asynchronous read (pseudo code)
read_sensor_async(callback);  // Start read, return immediately
// Can do other work
// ...
// Callback is called when data is ready
function callback(result) {
    process(result);
}

In embedded systems, this can be understood as RTOS being preemptive scheduling, while async programming is cooperative scheduling.

Scheduling Type	Characteristics
Preemptive Scheduling (RTOS)	Scheduler can interrupt task execution at any time Task switching is triggered by timer or high-priority tasks Tasks cannot choose "appropriate timing" to yield CPU
Cooperative Scheduling (Async)	Tasks actively yield execution rights (at await points) System switches to other ready tasks at await points Task switching points are explicit and predictable

However, cooperative scheduling cannot achieve absolute real-time performance, so for example, esp-idf-hal in esp32 uses FreeRTOS combined with Rust's async programming model.

Future Working Principle and async/await

Rust's async programming model is based on Future and Async, providing powerful async programming capabilities. Future is the foundation of Rust async programming, representing a computation that may not yet be complete. Here we directly cite content from the async-book:

#![allow(unused)]
fn main() {
trait SimpleFuture {
    type Output;
    fn poll(&mut self, wake: fn()) -> Poll<Self::Output>;
}

enum Poll<T> {
    Ready(T),
    Pending,
}
}

Future can be advanced by calling poll, which advances the future toward completion as much as possible. If the future completes, it returns Poll::Ready(result). If the future is not yet complete, it returns Poll::Pending and arranges for the wake() function to be called when the Future is ready for further progress. When wake() is called, the executor driving the Future will poll again, allowing the Future to make progress.

Without the wake() function, the executor would have no way of knowing whether a future is ready to make progress and would have to continuously poll all futures. But with the wake() function, the executor knows which futures are ready to be polled.

The key to understanding Future lies in its state machine model:

#![allow(unused)]
fn main() {
// Basic working mode of Future
let mut future = some_async_operation();

loop {
    match future.poll(&mut context) {
        Poll::Ready(value) => break value,   // Complete, return result
        Poll::Pending => {
            // Not complete, wait to be woken up then poll again
            wait_for_wakeup();
        }
    }
}
}

This implements the foundation of async programming, but this pattern has several problems: verbose code, complex state tracking, difficult nested callbacks. To solve these problems, Rust introduced async/await syntax sugar.

async/await Syntax Sugar

async/await is Rust's high-level syntax for handling async programming. It converts async code into state machines while maintaining synchronous code writing style:

#![allow(unused)]
fn main() {
// Concise writing with async/await
async fn read_sensor() -> Result<u16, Error> {
    let i2c = I2C::new().await?;
    let value = i2c.read_register(SENSOR_ADDR, TEMP_REG).await?;
    Ok(value)
}

// Calling async function
async fn process_data() {
    match read_sensor().await {
        Ok(data) => println!("Temperature: {}", data),
        Err(e) => println!("Read error: {:?}", e),
    }
}
}

Working Principle of async/await

When you write an async function, the Rust compiler will:

Generate state machine: Convert the function into a state machine that implements the Future trait
Identify .await points: Each .await is a point where execution might pause
Save context: Save necessary state at each pause point for resuming execution
Auto-implement poll: Generate complex state transition logic

Zero-Cost Abstractions

Rust's async/await is a "zero-cost abstraction" - the compiler translates high-level syntax into efficient state machine code:

No runtime overhead: Generated code is as efficient as hand-written state machines
No GC dependency: Doesn't need garbage collection, suitable for resource-constrained environments
Small memory footprint: Single async task requires less memory than RTOS threads
Predictable execution: Task switching points are explicit (at .await), reducing race conditions

For example, a simple async fn code:

#![allow(unused)]
fn main() {
async fn example() {
    let a = step_one().await;
    let b = step_two(a).await;
    step_three(b).await;
}
}

Will be converted by the compiler into a roughly equivalent state machine:

#![allow(unused)]
fn main() {
enum ExampleStateMachine {
    Start,
    WaitingOnStepOne(StepOneFuture),
    WaitingOnStepTwo(StepTwoFuture, StepOneOutput),
    WaitingOnStepThree(StepThreeFuture),
}

impl Future for ExampleStateMachine {
    type Output = ();
    
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
        // Complex state transition logic, automatically generated by compiler
        // ...
    }
}
}

Async Runtime and Executors in Microcontrollers

To run async code, an executor is needed to manage Future polling and wakeup. There are several popular async runtimes in embedded Rust, with two typical ones being:

Embassy: Lightweight async runtime designed specifically for embedded systems, supporting various MCUs RTIC: Real-time interrupt-driven concurrency framework

These runtimes are all optimized for resource-constrained environments, providing minimal memory footprint and efficient scheduling.

We mainly introduce Embassy here, which is also a popular async runtime in Rust embedded systems, providing HAL that supports cross-platform development for various MCUs.

Embassy Features

Static memory allocation: No need for dynamic heap allocation, memory requirements determined at compile time
Lower memory overhead: Single async task requires much less RAM than RTOS tasks
Low context switching cost: Task switching doesn't require complete context save/restore
Type-safe task communication: Provides typed channels and signals for inter-task communication
Simple programming model: Async programming model is well-suited for embedded development

Embassy Programming Model

Embassy's programming model is very simple, just add the #[embassy_executor::task] attribute to functions:

// Multiple concurrent tasks under Embassy framework
#[embassy_executor::task]
async fn sensor_task(i2c: I2c<'static>) {
    let mut sensor = Bme280::new(i2c);
    loop {
        let reading = sensor.read().await.unwrap();
        // Process sensor data
        Timer::after_secs(1).await;
    }
}

#[embassy_executor::task]
async fn display_task(spi: Spi<'static>, mut display_data: Signal<DisplayData>) {
    let mut display = Display::new(spi);
    loop {
        // Wait for new display data
        let data = display_data.wait().await;
        display.update(&data).await;
    }
}

#[embassy_executor::main]
async fn main(_spawner: embassy_executor::Spawner) {
    ...
    ...
    _spawner.spawn(sensor_task(i2c)).ok();
    _spawner.spawn(display_task(spi, display_data)).ok();
}

Async and DMA

In embedded systems, DMA (Direct Memory Access) is an efficient data transfer method, especially when processing large amounts of data. DMA can bypass the CPU and transfer data directly between memory and peripherals, thus improving data transfer efficiency.

The async programming model and DMA can be said to be very compatible, because the async programming model can avoid CPU blocking while waiting for data transfer, thus utilizing DMA for data transfer, reducing CPU burden for handling I/O.

In Embassy's HAL implementation, most async drivers use DMA transfer as the default implementation, such as I2C, SPI, UART, etc.

Multi-Priority Task Scheduling

Although Embassy's async model is based on cooperative scheduling, it supports preemptive-like multi-priority scheduling. This hybrid model combines the memory efficiency of cooperative scheduling with the real-time responsiveness of preemptive scheduling.

Embassy allows creating multiple executors with different priorities. The official also provides an example:

#[embassy_executor::task]
async fn run_high() {
    loop {
        info!("        [high] tick!");
        Timer::after_ticks(27374).await;
    }
}

#[embassy_executor::task]
async fn run_med() {
    loop {
        let start = Instant::now();
        info!("    [med] Starting long computation");
        // Spin-wait to simulate a long CPU computation
        embassy_time::block_for(embassy_time::Duration::from_secs(1)); // ~1 second
        let end = Instant::now();
        let ms = end.duration_since(start).as_ticks() / 33;
        info!("    [med] done in {} ms", ms);
        Timer::after_ticks(23421).await;
    }
}

#[embassy_executor::task]
async fn run_low() {
    loop {
        let start = Instant::now();
        info!("[low] Starting long computation");
        // Spin-wait to simulate a long CPU computation
        embassy_time::block_for(embassy_time::Duration::from_secs(2)); // ~2 seconds
        let end = Instant::now();
        let ms = end.duration_since(start).as_ticks() / 33;
        info!("[low] done in {} ms", ms);
        Timer::after_ticks(32983).await;
    }
}

static EXECUTOR_HIGH: InterruptExecutor = InterruptExecutor::new();
static EXECUTOR_MED: InterruptExecutor = InterruptExecutor::new();
static EXECUTOR_LOW: StaticCell<Executor> = StaticCell::new();

#[interrupt]
unsafe fn UART4() {EXECUTOR_HIGH.on_interrupt()}

#[interrupt]
unsafe fn UART5() {EXECUTOR_MED.on_interrupt()}

#[entry]
fn main() -> ! {
    info!("Hello World!");
    let _p = embassy_stm32::init(Default::default());
    interrupt::UART4.set_priority(Priority::P6);
    let spawner = EXECUTOR_HIGH.start(interrupt::UART4);
    unwrap!(spawner.spawn(run_high()));
    // Medium-priority executor: UART5, priority level 7
    interrupt::UART5.set_priority(Priority::P7);
    let spawner = EXECUTOR_MED.start(interrupt::UART5);
    unwrap!(spawner.spawn(run_med()));
    // Low priority executor: runs in thread mode, using WFE/SEV
    let executor = EXECUTOR_LOW.init(Executor::new());
    executor.run(|spawner| {
        unwrap!(spawner.spawn(run_low()));
    });
}

//!Output result
//! ```
//!     [med] Starting long computation
//!     [med] done in 992 ms
//!         [high] tick!
//! [low] Starting long computation
//!     [med] Starting long computation
//!         [high] tick!
//!         [high] tick!
//!     [med] done in 993 ms
//!     [med] Starting long computation
//!         [high] tick!
//!         [high] tick!
//!     [med] done in 993 ms
//! [low] done in 3972 ms
//!     [med] Starting long computation
//!         [high] tick!
//!         [high] tick!
//!     [med] done in 993 ms
//! ```

From the output results, you can see high-priority tasks interrupting medium and low-priority tasks.

The priority principle in Embassy is to configure idle interrupts with different priorities, utilizing high-priority interrupts in microcontrollers to interrupt low-priority interrupts, thus achieving multi-priority task scheduling.

This multi-priority approach can improve real-time performance of critical tasks.

Inter-Task Communication

Embassy provides many safe ways for inter-task communication. You can also use Atomic types for inter-task communication.

Note that Atomic types depend on underlying hardware support. Operations like load and store are relatively common atomic operations, but on certain architectures or for more complex operations (like atomic add/subtract, compare-and-swap), hardware support varies. The portable-atomic crate provides a unified API that utilizes hardware instructions when available or provides software implementations based on critical sections when hardware doesn't support it, ensuring cross-platform availability.

For multiple tasks using the same peripheral like GPIO, you can use the Mutex mentioned above.

Summary

Embassy and Rust async programming have many more features. I also recommend checking out the Embassy official documentation and async-book. This article also provides some comparisons between Embassy and RTOS: Async Rust vs RTOS showdown!

Embedded async programming also has some limitations to consider:

Binary size: Compiler-generated state machines may lead to increased code size
Debugging complexity: Debugging async code is more complex than synchronous code
Real-time performance: Pure cooperative scheduling is difficult to guarantee hard real-time performance requirements

You need to evaluate solutions based on specific scenarios. RTIC, esp-idf-hal, and many other frameworks also provide solutions combining RTOS thread models with async models.

Embassy porting is also very convenient. Besides officially supported ones, there are now Embassy support for domestic MCUs like py32 and ch32.

Rust Embedded Tutorial for Engineers