Skip to content

FPB-Based Embedded Runtime Code Injection Tool & Implementation

License

Notifications You must be signed in to change notification settings

FASTSHIFT/FPBInject

Repository files navigation

FPBInject - Cortex-M Runtime Code Injection Tool

License: MIT Platform Platform Ask DeepWiki CI

Runtime code injection tool for ARM Cortex-M3/M4 using the Flash Patch and Breakpoint (FPB) unit.

Overview

FPBInject enables runtime function hooking and code injection on Cortex-M microcontrollers without modifying Flash memory. It leverages the FPB hardware unit to redirect function calls to custom code loaded in RAM, supporting both legacy (Cortex-M3/M4) and modern (ARMv8-M) architectures.

Key Features

  • Zero Flash Modification - Inject code at runtime without erasing/writing Flash
  • Hardware-Level Redirection - Uses Cortex-M FPB unit for zero-overhead patching
  • Dual Injection Modes - Supports REMAP (Cortex-M3/M4) and DebugMonitor (ARMv8-M)
  • Multiple Hooks - Supports up to 6 simultaneous code patches (STM32F103)
  • Transparent Hijacking - Completely transparent to calling code
  • Reversible - Easily disable patches to restore original behavior

How It Works

flowchart LR
    subgraph step1 ["1. Original Call"]
        A["caller()<br/>calls<br/>digitalWrite"]
    end
    
    subgraph step2 ["2. FPB Intercept"]
        B["FPB Unit<br/>addr match<br/>0x08008308"]
    end
    
    subgraph step3 ["3. Trampoline"]
        C["trampoline_0<br/>in Flash<br/>loads target"]
    end
    
    subgraph step4 ["4. RAM Code Execution"]
        D["inject_digitalWrite() @ 0x20000278 (RAM)<br/>• Custom hook logic executes<br/>• Can call original function or replace entirely"]
    end
    
    A --> B --> C --> D
Loading

Architecture

The injection uses a two-stage approach:

  1. FPB REMAP: Redirects original function address to a trampoline function in Flash
  2. Trampoline: Pre-placed code in Flash that reads target address from RAM and jumps to it

This design allows dynamic target changes without runtime Flash modification.

Hardware Requirements

  • MCU: STM32F103C8T6 (Blue Pill) or other Cortex-M3/M4 device
  • Debugger: ST-Link V2 (for flashing)
  • Serial: USB-to-Serial adapter or USB CDC
  • LED: PC13 (onboard Blue Pill LED)

Software Requirements

  • ARM GNU Toolchain (arm-none-eabi-gcc)
  • CMake (>= 3.16)
  • Python 3.x with pyserial
  • ST-Link Tools or OpenOCD

Quick Start

1. Clone Repository

git clone https://github.com/FASTSHIFT/FPBInject.git
cd FPBInject

2. Build

# Configure
cmake -B build -DAPP_SELECT=3 -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

# Build
cmake --build build

3. Flash

st-flash write build/FPBInject.bin 0x08000000

4. Inject Code

# Inject custom code to hook digitalWrite function
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
    --inject App/inject/inject.cpp \
    --target digitalWrite

Usage

Command Line Options

fpb_loader.py [options]

Options:
  -p, --port PORT      Serial port (e.g., /dev/ttyACM0)
  -b, --baudrate BAUD  Baud rate (default: 115200)
  --inject FILE        Source file to inject (.c or .cpp)
  --target FUNC        Target function to hook
  --func NAME          Inject function name (default: first inject_*)
  --comp N             FPB comparator index 0-5 (default: 0)
  -i, --interactive    Interactive mode
  --ping               Test connection
  --info               Show device info

Examples

# Hook digitalWrite with custom logging
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
    --inject App/inject/inject.cpp \
    --target digitalWrite

# Hook blink_led with no-args injector
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
    --inject App/inject/inject.cpp \
    --target 'blink_led()' \
    --func inject_no_args

# Use different comparator for multiple hooks
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
    --inject App/inject/inject.cpp \
    --target pinMode \
    --comp 1

Writing Injection Code

Create a source file with an inject_* function:

// App/inject/inject.cpp
#include <Arduino.h>

// Hook function - replaces digitalWrite
__attribute__((used, section(".text.inject")))
void inject_digitalWrite(uint8_t pin, uint8_t value) {
    Serial.printf("Hooked: pin=%d val=%d\n", pin, value);
    // Call original or custom implementation
    value ? digitalWrite_HIGH(pin) : digitalWrite_LOW(pin);
}

// Simple hook without arguments
__attribute__((used, section(".text.inject")))
void inject_no_args(void) {
    Serial.printf("Function called at %dms\n", (int)millis());
}

Configuration

CMake Options

Option Default Description
APP_SELECT 1 Application (1=blink, 2=test, 3=func_loader)
FL_ALLOC_MODE STATIC Memory allocation mode (STATIC/LIBC/UMM)
FPB_NO_DEBUGMON OFF Disable DebugMonitor support (reduces code size)
FPB_NO_TRAMPOLINE OFF Disable trampoline (for cores that can REMAP to RAM)
FPB_TRAMPOLINE_NO_ASM OFF Use C instead of assembly (no argument preservation)
HSE_VALUE 8000000 External oscillator frequency
STM32_DEVICE STM32F10X_MD Target device variant

Memory Allocation Modes

Mode CMake Value Description
Static STATIC Fixed-size static buffer (4KB, default)
LIBC LIBC Standard libc malloc/free (dynamic)
UMM UMM umm_malloc embedded allocator (8KB heap)

Example:

# Static allocation (default, 4KB fixed buffer)
cmake -B build -DAPP_SELECT=3 -DFL_ALLOC_MODE=STATIC \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

# LIBC malloc/free (dynamic allocation)
cmake -B build -DAPP_SELECT=3 -DFL_ALLOC_MODE=LIBC \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

# UMM_MALLOC (embedded allocator with 8KB heap)
cmake -B build -DAPP_SELECT=3 -DFL_ALLOC_MODE=UMM \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

Dynamic Allocation Address Alignment

⚠️ Important Technical Note

When using dynamic allocation modes (LIBC or UMM), the injection code must be placed at an 8-byte aligned address. ARM Cortex-M functions require proper alignment for correct execution.

The Problem:

  • malloc() may return addresses that are only 4-byte aligned (e.g., 0x20001544)
  • GCC aligns functions to 8-byte boundaries, causing a 4-byte offset in the compiled binary
  • If code is uploaded without accounting for this offset, all address references (strings, function calls) will be incorrect

The Solution (handled automatically by fpb_loader.py):

  1. Allocate extra space: size + 8 bytes
  2. Calculate aligned address: aligned = (raw + 7) & ~7
  3. Upload code starting at the alignment offset
Example:
  malloc returns:  0x20001544 (4-byte aligned)
  aligned address: 0x20001548 (8-byte aligned)
  offset:          4 bytes
  
  Upload: data written to offset 4 in buffer
  Result: code starts at 0x20001548, addresses match

This is why static allocation (FL_ALLOC_MODE=STATIC) uses a buffer with __attribute__((aligned(4), section(".ram_code"))) - ensuring proper alignment from the start.

Trampoline Modes

Mode CMake Option Description
ASM (default) - Uses inline assembly to preserve R0-R3 registers
No ASM -DFPB_TRAMPOLINE_NO_ASM=ON Simple C function call, no argument preservation
Disabled -DFPB_NO_TRAMPOLINE=ON No trampoline, for cores that support direct RAM REMAP

Example:

# Build with C-based trampoline (no assembly)
cmake -B build -DAPP_SELECT=3 -DFPB_TRAMPOLINE_NO_ASM=ON \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

# Build without trampoline (for Cortex-M4/M7 with RAM REMAP support)
cmake -B build -DAPP_SELECT=3 -DFPB_NO_TRAMPOLINE=ON \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

Patch Modes

FPBInject supports three different patch modes for function redirection:

Mode Option Best For Description
Trampoline --patch-mode trampoline Cortex-M3/M4 FPB REMAP to Flash trampoline → RAM (default)
DebugMonitor --patch-mode debugmon ARMv8-M FPB breakpoint → DebugMonitor exception → PC redirect
Direct --patch-mode direct Special cases Direct FPB REMAP (limited use)

Example:

# Default trampoline mode (Cortex-M3/M4)
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
    --inject App/inject/inject.cpp \
    --target digitalWrite

# DebugMonitor mode (for ARMv8-M or as alternative)
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
    --inject App/inject/inject.cpp \
    --target digitalWrite \
    --patch-mode debugmon

DebugMonitor Mode

Why DebugMonitor Mode?

ARMv8-M Architecture Limitation: Starting with ARMv8-M (Cortex-M23/M33/M55), ARM removed the FPB REMAP functionality. The FPB unit can only generate breakpoints, not redirect code execution. This means the traditional trampoline approach doesn't work on newer cores.

DebugMonitor mode provides a software-based alternative that works on both legacy (Cortex-M3/M4) and modern (ARMv8-M) architectures.

How DebugMonitor Mode Works

flowchart TB
    subgraph step1 ["1. Function Call"]
        A["caller()<br/>calls<br/>digitalWrite"]
    end
    
    subgraph step2 ["2. FPB Breakpoint"]
        B["FPB Unit<br/>BKPT trigger<br/>@ 0x08008308"]
    end
    
    subgraph step3 ["3. DebugMonitor"]
        C["DebugMon_Handler()<br/>(exception)"]
    end
    
    subgraph step4 ["4. Stack Frame Modification"]
        D["Exception Stack Frame:<br/>[SP+0] R0 - preserved<br/>[SP+4] R1 - preserved<br/>[SP+8] R2 - preserved<br/>[SP+12] R3 - preserved<br/>[SP+16] R12 - preserved<br/>[SP+20] LR - preserved<br/>[SP+24] PC ◄── MODIFIED to inject_digitalWrite<br/>[SP+28] xPSR - preserved"]
    end
    
    subgraph step5 ["5. Exception Return"]
        E["Execution continues at<br/>inject_digitalWrite()"]
    end
    
    A --> B --> C --> D --> E
Loading

Technical Implementation

  1. FPB Configuration: Comparator is set with REPLACE=0b11 (breakpoint on both halfwords)
  2. DebugMonitor Enable: DEMCR.MON_EN is set to enable DebugMonitor exception
  3. PC Modification: When breakpoint triggers, handler modifies stacked PC to redirect execution

Key Registers

Register Address Purpose
DEMCR 0xE000EDFC Debug Exception and Monitor Control
DFSR 0xE000ED30 Debug Fault Status Register

DEMCR Configuration

DEMCR bits used:
  [24] TRCENA  - Trace enable (required for debug features)
  [16] MON_EN  - DebugMonitor exception enable

FPB Comparator Configuration (Breakpoint Mode)

FP_COMPn bits:
  [31:30] REPLACE = 0b11  - Breakpoint on both halfwords
  [28:2]  COMP           - Address to match (bits [28:2])
  [0]     ENABLE = 1     - Comparator enable

Advantages of DebugMonitor Mode

Advantage Description
ARMv8-M Compatible Works on Cortex-M23/M33/M55 where REMAP is removed
No Flash Trampolines Doesn't require pre-placed code in Flash
Full Register Preservation All registers preserved via exception frame
Dynamic Configuration Can add/remove hooks at runtime

Limitations and Considerations

Limitation Description
⚠️ Higher Latency Exception entry/exit adds ~12-24 cycles overhead
⚠️ Debugger Conflict External debugger may interfere with DebugMonitor
⚠️ Priority Constraints DebugMonitor has fixed high priority (-1)

Build Configuration

# Enable DebugMonitor support (default: enabled)
cmake -B build -DAPP_SELECT=3 \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

# Disable DebugMonitor (reduces code size if not needed)
cmake -B build -DAPP_SELECT=3 -DFPB_NO_DEBUGMON=ON \
      -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake

When to Use DebugMonitor Mode

Use Case Recommended Mode
Cortex-M3/M4 with Flash trampolines available Trampoline (lower overhead)
ARMv8-M (Cortex-M23/M33/M55) DebugMonitor (only option)
No Flash trampolines pre-placed DebugMonitor
Lowest latency required Trampoline
Maximum compatibility needed DebugMonitor

NuttX Implementation (fpb_debugmon_nuttx.c)

On NuttX RTOS, the DebugMonitor implementation uses NuttX's native up_debugpoint_add() API instead of directly manipulating hardware registers. This provides better integration with the OS and avoids conflicts with other debug components.

Architecture

flowchart TB
    subgraph init ["1. Initialization"]
        A1["fpb_debugmon_init()"] --> A2["irq_attach(NVIC_IRQ_DBGMONITOR,<br/>arm_dbgmonitor, NULL)<br/><i>Replace vendor's PANIC handler</i>"]
        A2 --> A3["arm_enable_dbgmonitor()<br/><i>Initialize FPB/DWT hardware</i>"]
    end
    
    subgraph redirect ["2. Set Redirect"]
        B1["fpb_debugmon_set_redirect<br/>(comp, orig_addr, redirect_addr)"] --> B2["up_debugpoint_add<br/>(DEBUGPOINT_BREAKPOINT,<br/>addr, size,<br/>debugmon_callback,<br/>&redirect_info)"]
    end
    
    subgraph trigger ["3. Breakpoint Trigger"]
        C1["CPU hits breakpoint"] --> C2["arm_dbgmonitor()"]
        C2 --> C3["debugmon_callback()"]
        C3 --> C4["regs = running_regs()<br/><i>Get saved register context</i>"]
        C4 --> C5["regs[REG_PC] = redirect_addr<br/><i>Modify stacked PC</i>"]
        C5 --> C6["Exception return<br/>→ Execution at inject function"]
    end
    
    init --> redirect --> trigger
Loading

Key NuttX APIs Used

API Purpose
up_debugpoint_add() Register breakpoint with callback
up_debugpoint_remove() Remove breakpoint
arm_enable_dbgmonitor() Initialize FPB/DWT hardware
arm_dbgmonitor() NuttX's DebugMonitor exception handler
running_regs() Get current task's saved register context
irq_attach() Register interrupt handler

Register Context Modification

In NuttX, when an exception occurs, the register context is saved to tcb->xcp.regs. The running_regs() macro provides access to this:

#define running_regs() ((FAR void **)(g_running_tasks[this_cpu()]->xcp.regs))

By modifying regs[REG_PC], we can redirect execution when the exception returns.

Vendor Platform Considerations

Some vendors register their own DebugMonitor handler that calls PANIC(). The NuttX implementation re-attaches NuttX's proper arm_dbgmonitor() handler during initialization to ensure correct callback dispatch:

/* Replace vendor's PANIC handler with NuttX's proper handler */
irq_attach(NVIC_IRQ_DBGMONITOR, arm_dbgmonitor, NULL);
up_enable_irq(NVIC_IRQ_DBGMONITOR);
arm_enable_dbgmonitor();

NuttX Kconfig Requirements

CONFIG_ARCH_HAVE_DEBUG=y      # Required for up_debugpoint_add()
CONFIG_FPBINJECT=y            # Enable FPBInject

NuttX Usage Example

# Inject code on NuttX device with compile_commands.json for includes/defines
python3 Tools/fpb_loader.py -p /dev/ttyACM0 -b 921600 \
    --inject App/inject/inject.cpp \
    --target lv_malloc \
    --patch-mode debugmon \
    --elf nuttx.elf.elf \
    --compile-commands out/xxx/compile_commands.json \
    -ni

The -ni (NuttX interactive) flag automatically:

  1. Sends fl command to enter device's interactive mode
  2. Executes injection commands
  3. Sends exit to return to normal shell

FPB Technical Details

Flash Patch and Breakpoint Unit

The FPB is a Cortex-M debug component originally designed for:

  1. Setting hardware breakpoints
  2. Patching Flash bugs without reprogramming

FPB Versions

Version Architecture REMAP Support Breakpoint Support
FPBv1 Cortex-M3/M4 (ARMv7-M) ✅ Yes ✅ Yes
FPBv2 Cortex-M23/M33/M55 (ARMv8-M) ❌ Removed ✅ Yes

⚠️ Important: ARMv8-M removed the REMAP functionality from FPB. On these cores, FPB can only generate breakpoints, requiring the DebugMonitor approach for function redirection.

STM32F103 FPB Resources (FPBv1)

Resource Count Address Range
Code Comparators 6 0x00000000 - 0x1FFFFFFF
Literal Comparators 2 0x00000000 - 0x1FFFFFFF
REMAP Table 8 entries SRAM (configurable)

Registers

Register Address Description
FP_CTRL 0xE0002000 Control register
FP_REMAP 0xE0002004 Remap table base address
FP_COMP0-5 0xE0002008-1C Code comparators
FP_COMP6-7 0xE0002020-24 Literal comparators

Project Structure

FPBInject/
├── CMakeLists.txt              # Build configuration
├── README.md                   # This file
├── LICENSE                     # MIT License
├── cmake/
│   └── arm-none-eabi-gcc.cmake # Toolchain file
├── App/
│   ├── func_loader/            # Function loader application
│   ├── inject/                 # Example injection code
│   └── ...                     # Other app modules
├── Project/
│   ├── Application/            # Main application (entry)
│   ├── ArduinoAPI/             # Arduino compatibility layer
│   └── Platform/
│       └── STM32F10x/          # Platform HAL (drivers, startup, config)
├── Source/
│   ├── fpb_inject.c/h          # FPB driver
│   ├── fpb_trampoline.c/h      # Trampoline functions
│   ├── fpb_debugmon.c/h        # DebugMonitor functions
│   └── func_loader.c/h         # Command processor
└── Tools/
    ├── fpb_loader.py           # Host injection tool
    └── setup_env.py            # Environment setup

Limitations

  1. Address Range: FPB can only patch Code region (0x00000000 - 0x1FFFFFFF)
  2. Comparator Count: Limited to 6 simultaneous hooks (STM32F103)
  3. Instruction Set: Thumb/Thumb-2 only (not ARM mode)
  4. Debugger Conflict: Some debuggers use FPB for breakpoints

Use Cases

  • Hot Patching: Fix bugs on deployed devices
  • Feature Toggle: Enable/disable features at runtime
  • A/B Testing: Switch between implementations
  • Security Research: Dynamic analysis and hooking
  • Debugging: Temporarily modify program behavior
  • Instrumentation: Add logging/tracing without recompilation

API Reference

FPB Functions

void fpb_init(void);                              // Initialize FPB unit
void fpb_set_patch(uint8_t comp, uint32_t orig, uint32_t target);
void fpb_clear_patch(uint8_t comp);               // Clear patch
fpb_state_t fpb_get_state(void);                  // Get FPB state

Trampoline Functions

void fbp_trampoline_set_target(uint32_t comp, uint32_t target);
void fbp_trampoline_clear_target(uint32_t comp);
uint32_t fbp_trampoline_get_address(uint32_t comp);

License

MIT License - See LICENSE file.

References

About

FPB-Based Embedded Runtime Code Injection Tool & Implementation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •