Understanding the ARM Bare-Metal Build Process

If you are starting with embedded systems on ARM Cortex-M microcontrollers, you will often hear about bare-metal programming. This means working directly with the hardware without an operating system managing tasks. This post explains how your C code transforms into something that runs on a microcontroller by covering the compilation, linking, and flashing steps in the context of ARM Cortex-M devices.

Why Cross Compilation Is Necessary

Most development happens on x86-based computers like laptops or desktops, but microcontrollers such as STM32 use ARM architecture. To build code for these devices on your PC, you need cross compilation. This allows you to compile programs for a different CPU architecture than your development machine. The most widely used toolchain for ARM Cortex-M development is arm-none-eabi-gcc, part of the GNU Arm Embedded Toolchain.

The Build Process Explained

Here is an overview of how a C program gets built for an ARM microcontroller:

The preprocessor runs, expanding all headers and macros in your source file like main.c, producing main.i.
The compiler takes the preprocessed file and converts it to assembly code, generating main.s.
The assembler converts the assembly code into machine code, producing an object file main.o.
The linker combines object files, startup code, and the memory layout described by a linker script to create an executable in ELF format, main.elf.
You convert the ELF file to a format suitable for flashing, such as Intel HEX or binary.

Example Compilation Command

arm-none-eabi-gcc -c -mcpu=cortex-m4 -mthumb -S file.c -o file.o

This command instructs the compiler to generate assembly code optimized for a Cortex-M4 processor using the Thumb instruction set.

Using a Makefile to Automate the Build

To simplify the build process, you can use a Makefile that compiles multiple source files and links them into one executable. Here is a basic example:

CC = arm-none-eabi-gcc
CFLAGS = -c -O0 -mcpu=cortex-m4 -mthumb -std=gnu11 -Wall
LDFLAGS = -nostdlib -T stm32_ls.ld -Wl,-Map=final.map

all: main.o led.o stm32_startup.o final.elf

main.o: main.c
	$(CC) $(CFLAGS) main.c -o main.o

led.o: led.c
	$(CC) $(CFLAGS) led.c -o led.o

stm32_startup.o: stm32_startup.c
	$(CC) $(CFLAGS) stm32_startup.c -o stm32_startup.o

final.elf: main.o led.o stm32_startup.o
	$(CC) $(LDFLAGS) -o $@ $^

clean:
	rm -rf *.o *.elf

Inspecting Object Files

After compiling, you can use the objdump tool to examine object files and their sections:

arm-none-eabi-objdump -h main.o

Common sections include:

.text which contains your compiled program instructions
.data for initialized global variables stored in RAM
.bss for uninitialized global variables, zero-initialized at startup
.rodata for read-only data such as constants

What Happens at Microcontroller Startup

On reset, the microcontroller does not directly jump to your main() function. Instead, it jumps to an address defined in the vector table, where the startup code resides. The startup code sets up the environment by initializing memory sections and then calls your main() function.

A simplified version of a reset handler in C looks like this:

void Reset_Handler(void)
{
    uint32_t size = (uint32_t)&_edata - (uint32_t)&_sdata;
    uint8_t *dest = (uint8_t*)&_sdata;
    uint8_t *src = (uint8_t*)&_la_data;

    for(uint32_t i = 0; i < size; i++) {
        *dest++ = *src++;
    }

    size = (uint32_t)&_ebss - (uint32_t)&_sbss;
    dest = (uint8_t*)&_sbss;
    for(uint32_t i = 0; i < size; i++) {
        *dest++ = 0;
    }

    main();
}

This code copies initialized variables from flash to RAM and clears uninitialized variables before running the main application.

Understanding the Memory Map

A linker script controls where each piece of the program resides in memory. Here is a common simplified layout:

Flash Memory

Starts at address 0x08000000 and holds the interrupt vector table, code (.text), constants (.rodata), and the load image of initialized variables.

SRAM

Starts at 0x20000000 and contains initialized global variables (.data) during runtime, uninitialized globals (.bss), the heap, and the stack.

Example Linker Script

This script defines memory regions and sections:

ENTRY(Reset_Handler)

MEMORY
{
  FLASH(rx) : ORIGIN = 0x08000000, LENGTH = 1024K
  SRAM(rwx) : ORIGIN = 0x20000000, LENGTH = 128K
}

SECTIONS
{
  .text :
  {
    *(.isr_vector)
    *(.text*)
    *(.init)
    *(.fini)
    *(.rodata*)
    . = ALIGN(4);
    _etext = .;
  } > FLASH

  _la_data = LOADADDR(.data);

  .data :
  {
    _sdata = .;
    *(.data*)
    . = ALIGN(4);
    _edata = .;
  } > SRAM AT > FLASH

  .bss :
  {
    _sbss = .;
    *(COMMON)
    *(.bss*)
    . = ALIGN(4);
    _ebss = .;
  } > SRAM
}

Flashing the Program

Once the program is compiled and linked, it must be uploaded to the microcontroller. A typical workflow includes:

Starting OpenOCD with a configuration suitable for your board.
Launching GDB and connecting to OpenOCD using target remote localhost:3333.
Resetting and halting the microcontroller with monitor reset halt, then loading the program using load.

After this process, the microcontroller begins executing your code. This is bare-metal development in its purest form, where you control all hardware and software aspects.

For the complete source code and examples related to this project, visit my GitHub repository at https://github.com/sudoXpg/stm32f4-startup.