Runtime code injection tool for ARM Cortex-M3/M4 using the Flash Patch and Breakpoint (FPB) unit.
FPBInject enables runtime function hooking and code injection on Cortex-M microcontrollers without modifying Flash memory. It leverages the FPB hardware unit to redirect function calls to custom code loaded in RAM, supporting both legacy (Cortex-M3/M4) and modern (ARMv8-M) architectures.
- ✅ Zero Flash Modification - Inject code at runtime without erasing/writing Flash
- ✅ Hardware-Level Redirection - Uses Cortex-M FPB unit for zero-overhead patching
- ✅ Dual Injection Modes - Supports REMAP (Cortex-M3/M4) and DebugMonitor (ARMv8-M)
- ✅ Multiple Hooks - Supports up to 6 simultaneous code patches (STM32F103)
- ✅ Transparent Hijacking - Completely transparent to calling code
- ✅ Reversible - Easily disable patches to restore original behavior
flowchart LR
subgraph step1 ["1. Original Call"]
A["caller()<br/>calls<br/>digitalWrite"]
end
subgraph step2 ["2. FPB Intercept"]
B["FPB Unit<br/>addr match<br/>0x08008308"]
end
subgraph step3 ["3. Trampoline"]
C["trampoline_0<br/>in Flash<br/>loads target"]
end
subgraph step4 ["4. RAM Code Execution"]
D["inject_digitalWrite() @ 0x20000278 (RAM)<br/>• Custom hook logic executes<br/>• Can call original function or replace entirely"]
end
A --> B --> C --> D
The injection uses a two-stage approach:
- FPB REMAP: Redirects original function address to a trampoline function in Flash
- Trampoline: Pre-placed code in Flash that reads target address from RAM and jumps to it
This design allows dynamic target changes without runtime Flash modification.
- MCU: STM32F103C8T6 (Blue Pill) or other Cortex-M3/M4 device
- Debugger: ST-Link V2 (for flashing)
- Serial: USB-to-Serial adapter or USB CDC
- LED: PC13 (onboard Blue Pill LED)
- ARM GNU Toolchain (
arm-none-eabi-gcc) - CMake (>= 3.16)
- Python 3.x with
pyserial - ST-Link Tools or OpenOCD
git clone https://github.com/FASTSHIFT/FPBInject.git
cd FPBInject# Configure
cmake -B build -DAPP_SELECT=3 -DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake
# Build
cmake --build buildst-flash write build/FPBInject.bin 0x08000000# Inject custom code to hook digitalWrite function
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
--inject App/inject/inject.cpp \
--target digitalWritefpb_loader.py [options]
Options:
-p, --port PORT Serial port (e.g., /dev/ttyACM0)
-b, --baudrate BAUD Baud rate (default: 115200)
--inject FILE Source file to inject (.c or .cpp)
--target FUNC Target function to hook
--func NAME Inject function name (default: first inject_*)
--comp N FPB comparator index 0-5 (default: 0)
-i, --interactive Interactive mode
--ping Test connection
--info Show device info# Hook digitalWrite with custom logging
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
--inject App/inject/inject.cpp \
--target digitalWrite
# Hook blink_led with no-args injector
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
--inject App/inject/inject.cpp \
--target 'blink_led()' \
--func inject_no_args
# Use different comparator for multiple hooks
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
--inject App/inject/inject.cpp \
--target pinMode \
--comp 1Create a source file with an inject_* function:
// App/inject/inject.cpp
#include <Arduino.h>
// Hook function - replaces digitalWrite
__attribute__((used, section(".text.inject")))
void inject_digitalWrite(uint8_t pin, uint8_t value) {
Serial.printf("Hooked: pin=%d val=%d\n", pin, value);
// Call original or custom implementation
value ? digitalWrite_HIGH(pin) : digitalWrite_LOW(pin);
}
// Simple hook without arguments
__attribute__((used, section(".text.inject")))
void inject_no_args(void) {
Serial.printf("Function called at %dms\n", (int)millis());
}| Option | Default | Description |
|---|---|---|
APP_SELECT |
1 | Application (1=blink, 2=test, 3=func_loader) |
FL_ALLOC_MODE |
STATIC | Memory allocation mode (STATIC/LIBC/UMM) |
FPB_NO_DEBUGMON |
OFF | Disable DebugMonitor support (reduces code size) |
FPB_NO_TRAMPOLINE |
OFF | Disable trampoline (for cores that can REMAP to RAM) |
FPB_TRAMPOLINE_NO_ASM |
OFF | Use C instead of assembly (no argument preservation) |
HSE_VALUE |
8000000 | External oscillator frequency |
STM32_DEVICE |
STM32F10X_MD | Target device variant |
| Mode | CMake Value | Description |
|---|---|---|
| Static | STATIC |
Fixed-size static buffer (4KB, default) |
| LIBC | LIBC |
Standard libc malloc/free (dynamic) |
| UMM | UMM |
umm_malloc embedded allocator (8KB heap) |
Example:
# Static allocation (default, 4KB fixed buffer)
cmake -B build -DAPP_SELECT=3 -DFL_ALLOC_MODE=STATIC \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake
# LIBC malloc/free (dynamic allocation)
cmake -B build -DAPP_SELECT=3 -DFL_ALLOC_MODE=LIBC \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake
# UMM_MALLOC (embedded allocator with 8KB heap)
cmake -B build -DAPP_SELECT=3 -DFL_ALLOC_MODE=UMM \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake
⚠️ Important Technical NoteWhen using dynamic allocation modes (LIBC or UMM), the injection code must be placed at an 8-byte aligned address. ARM Cortex-M functions require proper alignment for correct execution.
The Problem:
malloc()may return addresses that are only 4-byte aligned (e.g.,0x20001544)- GCC aligns functions to 8-byte boundaries, causing a 4-byte offset in the compiled binary
- If code is uploaded without accounting for this offset, all address references (strings, function calls) will be incorrect
The Solution (handled automatically by
fpb_loader.py):
- Allocate extra space:
size + 8bytes- Calculate aligned address:
aligned = (raw + 7) & ~7- Upload code starting at the alignment offset
Example: malloc returns: 0x20001544 (4-byte aligned) aligned address: 0x20001548 (8-byte aligned) offset: 4 bytes Upload: data written to offset 4 in buffer Result: code starts at 0x20001548, addresses matchThis is why static allocation (
FL_ALLOC_MODE=STATIC) uses a buffer with__attribute__((aligned(4), section(".ram_code")))- ensuring proper alignment from the start.
| Mode | CMake Option | Description |
|---|---|---|
| ASM (default) | - | Uses inline assembly to preserve R0-R3 registers |
| No ASM | -DFPB_TRAMPOLINE_NO_ASM=ON |
Simple C function call, no argument preservation |
| Disabled | -DFPB_NO_TRAMPOLINE=ON |
No trampoline, for cores that support direct RAM REMAP |
Example:
# Build with C-based trampoline (no assembly)
cmake -B build -DAPP_SELECT=3 -DFPB_TRAMPOLINE_NO_ASM=ON \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake
# Build without trampoline (for Cortex-M4/M7 with RAM REMAP support)
cmake -B build -DAPP_SELECT=3 -DFPB_NO_TRAMPOLINE=ON \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmakeFPBInject supports three different patch modes for function redirection:
| Mode | Option | Best For | Description |
|---|---|---|---|
| Trampoline | --patch-mode trampoline |
Cortex-M3/M4 | FPB REMAP to Flash trampoline → RAM (default) |
| DebugMonitor | --patch-mode debugmon |
ARMv8-M | FPB breakpoint → DebugMonitor exception → PC redirect |
| Direct | --patch-mode direct |
Special cases | Direct FPB REMAP (limited use) |
Example:
# Default trampoline mode (Cortex-M3/M4)
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
--inject App/inject/inject.cpp \
--target digitalWrite
# DebugMonitor mode (for ARMv8-M or as alternative)
python3 Tools/fpb_loader.py -p /dev/ttyACM0 \
--inject App/inject/inject.cpp \
--target digitalWrite \
--patch-mode debugmonARMv8-M Architecture Limitation: Starting with ARMv8-M (Cortex-M23/M33/M55), ARM removed the FPB REMAP functionality. The FPB unit can only generate breakpoints, not redirect code execution. This means the traditional trampoline approach doesn't work on newer cores.
DebugMonitor mode provides a software-based alternative that works on both legacy (Cortex-M3/M4) and modern (ARMv8-M) architectures.
flowchart TB
subgraph step1 ["1. Function Call"]
A["caller()<br/>calls<br/>digitalWrite"]
end
subgraph step2 ["2. FPB Breakpoint"]
B["FPB Unit<br/>BKPT trigger<br/>@ 0x08008308"]
end
subgraph step3 ["3. DebugMonitor"]
C["DebugMon_Handler()<br/>(exception)"]
end
subgraph step4 ["4. Stack Frame Modification"]
D["Exception Stack Frame:<br/>[SP+0] R0 - preserved<br/>[SP+4] R1 - preserved<br/>[SP+8] R2 - preserved<br/>[SP+12] R3 - preserved<br/>[SP+16] R12 - preserved<br/>[SP+20] LR - preserved<br/>[SP+24] PC ◄── MODIFIED to inject_digitalWrite<br/>[SP+28] xPSR - preserved"]
end
subgraph step5 ["5. Exception Return"]
E["Execution continues at<br/>inject_digitalWrite()"]
end
A --> B --> C --> D --> E
- FPB Configuration: Comparator is set with REPLACE=0b11 (breakpoint on both halfwords)
- DebugMonitor Enable: DEMCR.MON_EN is set to enable DebugMonitor exception
- PC Modification: When breakpoint triggers, handler modifies stacked PC to redirect execution
| Register | Address | Purpose |
|---|---|---|
| DEMCR | 0xE000EDFC | Debug Exception and Monitor Control |
| DFSR | 0xE000ED30 | Debug Fault Status Register |
DEMCR bits used:
[24] TRCENA - Trace enable (required for debug features)
[16] MON_EN - DebugMonitor exception enable
FP_COMPn bits:
[31:30] REPLACE = 0b11 - Breakpoint on both halfwords
[28:2] COMP - Address to match (bits [28:2])
[0] ENABLE = 1 - Comparator enable
| Advantage | Description |
|---|---|
| ✅ ARMv8-M Compatible | Works on Cortex-M23/M33/M55 where REMAP is removed |
| ✅ No Flash Trampolines | Doesn't require pre-placed code in Flash |
| ✅ Full Register Preservation | All registers preserved via exception frame |
| ✅ Dynamic Configuration | Can add/remove hooks at runtime |
| Limitation | Description |
|---|---|
| Exception entry/exit adds ~12-24 cycles overhead | |
| External debugger may interfere with DebugMonitor | |
| DebugMonitor has fixed high priority (-1) |
# Enable DebugMonitor support (default: enabled)
cmake -B build -DAPP_SELECT=3 \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake
# Disable DebugMonitor (reduces code size if not needed)
cmake -B build -DAPP_SELECT=3 -DFPB_NO_DEBUGMON=ON \
-DCMAKE_TOOLCHAIN_FILE=cmake/arm-none-eabi-gcc.cmake| Use Case | Recommended Mode |
|---|---|
| Cortex-M3/M4 with Flash trampolines available | Trampoline (lower overhead) |
| ARMv8-M (Cortex-M23/M33/M55) | DebugMonitor (only option) |
| No Flash trampolines pre-placed | DebugMonitor |
| Lowest latency required | Trampoline |
| Maximum compatibility needed | DebugMonitor |
On NuttX RTOS, the DebugMonitor implementation uses NuttX's native up_debugpoint_add() API instead of directly manipulating hardware registers. This provides better integration with the OS and avoids conflicts with other debug components.
flowchart TB
subgraph init ["1. Initialization"]
A1["fpb_debugmon_init()"] --> A2["irq_attach(NVIC_IRQ_DBGMONITOR,<br/>arm_dbgmonitor, NULL)<br/><i>Replace vendor's PANIC handler</i>"]
A2 --> A3["arm_enable_dbgmonitor()<br/><i>Initialize FPB/DWT hardware</i>"]
end
subgraph redirect ["2. Set Redirect"]
B1["fpb_debugmon_set_redirect<br/>(comp, orig_addr, redirect_addr)"] --> B2["up_debugpoint_add<br/>(DEBUGPOINT_BREAKPOINT,<br/>addr, size,<br/>debugmon_callback,<br/>&redirect_info)"]
end
subgraph trigger ["3. Breakpoint Trigger"]
C1["CPU hits breakpoint"] --> C2["arm_dbgmonitor()"]
C2 --> C3["debugmon_callback()"]
C3 --> C4["regs = running_regs()<br/><i>Get saved register context</i>"]
C4 --> C5["regs[REG_PC] = redirect_addr<br/><i>Modify stacked PC</i>"]
C5 --> C6["Exception return<br/>→ Execution at inject function"]
end
init --> redirect --> trigger
| API | Purpose |
|---|---|
up_debugpoint_add() |
Register breakpoint with callback |
up_debugpoint_remove() |
Remove breakpoint |
arm_enable_dbgmonitor() |
Initialize FPB/DWT hardware |
arm_dbgmonitor() |
NuttX's DebugMonitor exception handler |
running_regs() |
Get current task's saved register context |
irq_attach() |
Register interrupt handler |
In NuttX, when an exception occurs, the register context is saved to tcb->xcp.regs. The running_regs() macro provides access to this:
#define running_regs() ((FAR void **)(g_running_tasks[this_cpu()]->xcp.regs))By modifying regs[REG_PC], we can redirect execution when the exception returns.
Some vendors register their own DebugMonitor handler that calls PANIC(). The NuttX implementation re-attaches NuttX's proper arm_dbgmonitor() handler during initialization to ensure correct callback dispatch:
/* Replace vendor's PANIC handler with NuttX's proper handler */
irq_attach(NVIC_IRQ_DBGMONITOR, arm_dbgmonitor, NULL);
up_enable_irq(NVIC_IRQ_DBGMONITOR);
arm_enable_dbgmonitor();CONFIG_ARCH_HAVE_DEBUG=y # Required for up_debugpoint_add()
CONFIG_FPBINJECT=y # Enable FPBInject
# Inject code on NuttX device with compile_commands.json for includes/defines
python3 Tools/fpb_loader.py -p /dev/ttyACM0 -b 921600 \
--inject App/inject/inject.cpp \
--target lv_malloc \
--patch-mode debugmon \
--elf nuttx.elf.elf \
--compile-commands out/xxx/compile_commands.json \
-niThe -ni (NuttX interactive) flag automatically:
- Sends
flcommand to enter device's interactive mode - Executes injection commands
- Sends
exitto return to normal shell
The FPB is a Cortex-M debug component originally designed for:
- Setting hardware breakpoints
- Patching Flash bugs without reprogramming
| Version | Architecture | REMAP Support | Breakpoint Support |
|---|---|---|---|
| FPBv1 | Cortex-M3/M4 (ARMv7-M) | ✅ Yes | ✅ Yes |
| FPBv2 | Cortex-M23/M33/M55 (ARMv8-M) | ❌ Removed | ✅ Yes |
⚠️ Important: ARMv8-M removed the REMAP functionality from FPB. On these cores, FPB can only generate breakpoints, requiring the DebugMonitor approach for function redirection.
| Resource | Count | Address Range |
|---|---|---|
| Code Comparators | 6 | 0x00000000 - 0x1FFFFFFF |
| Literal Comparators | 2 | 0x00000000 - 0x1FFFFFFF |
| REMAP Table | 8 entries | SRAM (configurable) |
| Register | Address | Description |
|---|---|---|
| FP_CTRL | 0xE0002000 | Control register |
| FP_REMAP | 0xE0002004 | Remap table base address |
| FP_COMP0-5 | 0xE0002008-1C | Code comparators |
| FP_COMP6-7 | 0xE0002020-24 | Literal comparators |
FPBInject/
├── CMakeLists.txt # Build configuration
├── README.md # This file
├── LICENSE # MIT License
├── cmake/
│ └── arm-none-eabi-gcc.cmake # Toolchain file
├── App/
│ ├── func_loader/ # Function loader application
│ ├── inject/ # Example injection code
│ └── ... # Other app modules
├── Project/
│ ├── Application/ # Main application (entry)
│ ├── ArduinoAPI/ # Arduino compatibility layer
│ └── Platform/
│ └── STM32F10x/ # Platform HAL (drivers, startup, config)
├── Source/
│ ├── fpb_inject.c/h # FPB driver
│ ├── fpb_trampoline.c/h # Trampoline functions
│ ├── fpb_debugmon.c/h # DebugMonitor functions
│ └── func_loader.c/h # Command processor
└── Tools/
├── fpb_loader.py # Host injection tool
└── setup_env.py # Environment setup
- Address Range: FPB can only patch Code region (0x00000000 - 0x1FFFFFFF)
- Comparator Count: Limited to 6 simultaneous hooks (STM32F103)
- Instruction Set: Thumb/Thumb-2 only (not ARM mode)
- Debugger Conflict: Some debuggers use FPB for breakpoints
- Hot Patching: Fix bugs on deployed devices
- Feature Toggle: Enable/disable features at runtime
- A/B Testing: Switch between implementations
- Security Research: Dynamic analysis and hooking
- Debugging: Temporarily modify program behavior
- Instrumentation: Add logging/tracing without recompilation
void fpb_init(void); // Initialize FPB unit
void fpb_set_patch(uint8_t comp, uint32_t orig, uint32_t target);
void fpb_clear_patch(uint8_t comp); // Clear patch
fpb_state_t fpb_get_state(void); // Get FPB statevoid fbp_trampoline_set_target(uint32_t comp, uint32_t target);
void fbp_trampoline_clear_target(uint32_t comp);
uint32_t fbp_trampoline_get_address(uint32_t comp);MIT License - See LICENSE file.