ESP32 / Embedded Development Guide
This guide covers running TinyRL on ESP32 microcontrollers for real-time reinforcement learning.
Quick Start
# Host simulation (no hardware needed)
c++ -std=c++17 -DAG_EMBEDDED \
-I src -I examples/stream_x/src \
examples/stream_x_esp32/src/stream_ac_continuous.cpp -o esp32_sim
./esp32_sim
# PlatformIO upload
cd examples/stream_x_esp32
pio run --target upload
pio device monitor
Table of Contents
- Overview
- Hardware Requirements
- Software Setup
- Project Structure
- Building and Flashing
- Optimization Strategies
- Troubleshooting
Overview
Why Run TinyRL on ESP32?
| Benefit | Description |
|---|---|
| Real-time Learning | Online reinforcement learning directly on microcontrollers |
| Minimal Footprint | Header-only integration with no external dependencies |
| Low Latency | Eliminates round-trips to host PC for inference |
| Power Efficient | Optimized for battery-powered applications |
| Cost Effective | Uses affordable, widely-available hardware |
Supported Hardware
| Board | Status | Notes |
|---|---|---|
| ESP32-S3 | ✅ Recommended | Tested with PSRAM |
| ESP32-S2 | ✅ Compatible | Good performance |
| ESP32 | ⚠️ Basic | Limited memory |
| ESP32-C3 | ✅ Compatible | RISC-V core |
Hardware Requirements
Recommended Setup
- Board: Freenove ESP32-S3 WROOM or similar ESP32-S3 board
- Memory: 8MB+ PSRAM recommended for larger models
- Storage: 4MB+ flash for firmware and model storage
- USB: Type-C or micro-USB for programming and debugging
Minimum Requirements
- RAM: 512KB (for minimal models)
- Flash: 2MB
- CPU: 240MHz dual-core
Optional Hardware
- Display: OLED/LCD for real-time visualization
- Sensors: IMU, temperature, pressure sensors for environment interaction
- Actuators: Servos, motors, relays for action execution
Software Setup
Prerequisites
-
PlatformIO IDE (recommended)
-
ESP-IDF (alternative)
-
Arduino IDE (basic support)
- Install ESP32 board support package
- Add TinyRL headers to libraries folder
Development Environment Setup
PlatformIO (Recommended)
# Create new project
pio project init --board freenove_esp32_s3_wroom
# Or use existing ESP32 example
cd examples/stream_x_esp32
ESP-IDF
Project Structure
PlatformIO Project Layout
examples/stream_x_esp32/
├── platformio.ini # Build configuration
├── src/
│ ├── stream_ac_continuous.cpp # Continuous StreamAC example
│ ├── stream_ac_discrete.cpp # Discrete StreamAC example
│ ├── stream_q.cpp # StreamQ example
│ ├── stream_q_atari.cpp # StreamQ Atari example
│ ├── stream_q_pong_ram.cpp # StreamQ Pong (RAM) example
│ ├── benchmark_kernels.cpp # Kernel benchmarks
│ └── getentropy_dummy.c # Entropy source for embedded
├── lib/ # Optional: copied headers
│ ├── autograd/ # TinyRL core headers
│ └── stream_x/ # Stream-X module headers (StreamAC, StreamQ, StreamSARSA algorithms)
└── boards/ # Board-specific configurations
└── freenove_esp32_s3_wroom.json
Header Integration Strategies
Strategy 1: Relative Includes (Development)
; platformio.ini
[env:freenove_esp32_s3_wroom]
platform = espressif32
board = freenove_esp32_s3_wroom
framework = arduino
build_flags =
-DAG_EMBEDDED
-DAG_ENABLE_SIMD=OFF
-I../../src
-I../../examples/stream_x/src ; Stream-X module headers (StreamAC, StreamQ, StreamSARSA algorithms)
Pros: Quick iteration, keeps repo structure intact Cons: Requires full repository, not standalone
Strategy 2: Vendor Headers (Production)
# Copy headers to project
mkdir -p lib/autograd lib/stream_x
cp ../../src/*.h lib/autograd/
cp ../../examples/stream_x/src/*.h lib/stream_x/
Pros: Standalone project, production-ready Cons: Manual header management, larger project size
Building and Flashing
PlatformIO Workflow
# Navigate to project
cd examples/stream_x_esp32
# Build project
pio run
# Upload to device
pio run --target upload
# Monitor serial output
pio device monitor
# Clean build
pio run --target clean
ESP-IDF Workflow
# Configure project
idf.py menuconfig
# Build project
idf.py build
# Flash to device
idf.py flash
# Monitor output
idf.py monitor
Build Configuration
PlatformIO Configuration
; platformio.ini
[env:freenove_esp32_s3_wroom]
platform = espressif32
board = freenove_esp32_s3_wroom
framework = arduino
; Build flags for TinyRL
build_flags =
-DAG_EMBEDDED ; Enable embedded mode
-DAG_ENABLE_SIMD=OFF ; Disable SIMD for compatibility
-O2 ; Optimize for size/speed
-ffunction-sections ; Enable function-level linking
-fdata-sections ; Enable data-level linking
-I../../src ; Include TinyRL headers
-I../../examples/stream_x/src ; Include Stream-X headers
; Memory configuration
board_build.partitions = huge_app.csv ; Use huge app partition
board_build.psram_type = opi ; Enable PSRAM
ESP-IDF Configuration
# Configure memory layout
idf.py menuconfig
# Navigate to: Component config → ESP32S3-Specific → PSRAM
# Enable: Support for external, SPI PSRAM
# Set: PSRAM clock speed to 80MHz
Upload and Monitoring
# Upload firmware
pio run --target upload
# Monitor with custom baud rate
pio device monitor --baud 115200
# Monitor with filter
pio device monitor --filter time,colorize
Optimization Strategies
ESP32-S3 Optimizations (Recommended)
TinyRL includes specialized optimizations for ESP32-S3, providing 2-4x speedup for neural network operations through pure optimized scalar implementations tuned for the LX7 dual-core architecture.
Enabling ESP32-S3 Optimizations
The optimizations are auto-detected when building for ESP32-S3. To explicitly enable:
What's Optimized
| Category | Operation | Speedup | Technique |
|---|---|---|---|
| Matrix Ops | Matrix multiply | 2-3x | Tiled computation, cache-friendly access |
| Dot product | 2-3x | 8-way loop unrolling | |
| Element-wise | Add/Sub/Mul | 2x | 4-8 way unrolling, ILP |
| Scalar multiply | 2x | Unrolled loops | |
| Activations | ReLU/Sigmoid/Tanh | 1.5-2x | Unrolled loops |
| Softmax | 2x | Optimized exp + normalize | |
| CNN | Convolution | 2-3x | im2col + optimized matmul |
| Max/Avg Pooling | 1.5x | Cache-optimized kernels | |
| Optimizers | SGD step | 2x | Fused update kernel |
| RMSProp step | 1.5-2x | Optimized variance updates | |
| Normalization | LayerNorm | 2x | Single-pass mean/var, fast rsqrt |
| MLP Inference | 3-layer forward | 2-3x | Fused linear+LN+ReLU, batch=1 path |
| MLP Training | 3-layer backward | 2-3x | Fused backward, batch=1 optimized |
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ simd.h │
│ ┌─────────────┐ ┌─────────────┐ ┌───────────────────────┐ │
│ │ AVX2 │ │ NEON │ │ ESP32-S3 │ │
│ │ (x86-64) │ │ (ARM64) │ │ (esp32_dsp.h) │ │
│ └─────────────┘ └─────────────┘ └───────────────────────┘ │
│ │ │
│ ┌─────────────────────────┼─────────────┐ │
│ │ ▼ │ │
│ │ Pure Optimized Scalar Implementations│ │
│ │ ─────────────────────────────────────│ │
│ │ • 8-way loop unrolling (ILP) │ │
│ │ • Cache-friendly access patterns │ │
│ │ • IRAM placement for critical funcs │ │
│ │ • Fast math (Newton-Raphson rsqrt) │ │
│ │ • Single-pass statistics │ │
│ │ • Tiled matrix multiplication │ │
│ │ • Fused MLP inference (batch=1) │ │
│ │ • Fused MLP backward (batch=1) │ │
│ └───────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Key Optimization Techniques
| Technique | Description |
|---|---|
| 8-way Loop Unrolling | Maximizes instruction-level parallelism on LX7 cores |
| Cache-Friendly Access | 32-byte cache line alignment, tiled computation |
| IRAM Placement | Critical functions placed in fast internal RAM |
| Fast rsqrt | Newton-Raphson approximation, 2x faster than 1/sqrtf() |
| Fused mean/var | Single-pass Welford's algorithm for LayerNorm |
| Batch=1 Inference | Specialized fused_mlp3_inference() for RL workloads |
| Batch=1 Training | Specialized fused_mlp3_backward_single() for streaming RL |
Example: Measuring Performance
#include <esp_timer.h>
void benchmark_matmul() {
ag::Matrix A = ag::Matrix::Random(64, 64);
ag::Matrix B = ag::Matrix::Random(64, 64);
int64_t start = esp_timer_get_time();
for (int i = 0; i < 100; i++) {
ag::Matrix C = A.matmul(B);
}
int64_t elapsed = esp_timer_get_time() - start;
Serial.printf("100x matmul (64x64): %lld us\n", elapsed);
// Expected: ~2-3x faster with ESP32-S3 optimizations enabled
}
void benchmark_forward_pass() {
// Create a small MLP
nn::Sequential model;
model.add(nn::Linear(64, 32));
model.add(nn::ReLU());
model.add(nn::Linear(32, 4));
ag::Tensor x(ag::Matrix::Random(1, 64), false);
int64_t start = esp_timer_get_time();
for (int i = 0; i < 1000; i++) {
ag::Tensor y = model.forward(x);
}
int64_t elapsed = esp_timer_get_time() - start;
Serial.printf("1000x forward (64->32->4): %lld us\n", elapsed);
}
Memory Optimization
Reduce Model Size
// In src/stream_ac_continuous.cpp - Reduce hidden layer size
#define HIDDEN_SIZE 64 // Instead of 128 or 256
#define NUM_LAYERS 2 // Instead of 3 or 4
// Create smaller network (model provided via set_model)
ContinuousStreamAC agent(
n_obs, // Observation dimension
learning_rate, // Learning rate
gamma, // Discount factor
lambda, // Eligibility trace
kappa_policy, // Policy overshooting bound
kappa_value // Value overshooting bound
);
// Build compact actor/critic and attach:
// agent.set_model(actor_backbone, mu_head, std_head, critic);
Disable Features
// Remove normalization for minimal RAM
// Disable normalization in your model or skip calls to normalize_observation()
// Use smaller PRNG
// Replace default RNG with XorShift or similar
Memory Management
// Reuse tensor objects
ag::Tensor workspace(ag::Matrix::Zeros(1, HIDDEN_SIZE), false);
// Clear computational graphs frequently
loss.backward();
loss.clear_graph();
// Use stack allocation when possible
ag::Matrix small_matrix(10, 10); // Stack allocated
Performance Optimization
Compiler Optimizations
; platformio.ini
build_flags =
-O2 ; Optimize for size/speed
-ffast-math ; Fast math operations
-fno-exceptions ; Disable exception handling
-fno-rtti ; Disable RTTI
-ffunction-sections ; Function-level linking
-fdata-sections ; Data-level linking
Runtime Optimizations
// Use const references
void process_observation(const ag::Tensor& obs) {
// Process observation
}
// Avoid unnecessary copies
ag::Tensor& get_workspace() {
static ag::Tensor workspace(ag::Matrix::Zeros(1, 64), false);
return workspace;
}
Power Optimization
// Reduce CPU frequency for battery operation
#include "esp_pm.h"
#include "esp_sleep.h"
// Set CPU frequency
esp_pm_config_esp32s3_t pm_config = {
.max_freq_mhz = 80, // Reduce from 240MHz
.min_freq_mhz = 10,
.light_sleep_enable = true
};
esp_pm_configure(&pm_config);
// Enable light sleep between iterations
esp_light_sleep_start();
Troubleshooting
Common Issues
Build Errors
Error: fatal error: 'autograd/autograd.h' file not found
Solution: Check include paths in platformio.ini:
Error: error: 'std::expected' is not a member of 'std'
Solution: Ensure C++17 support:
Error: error: 'AG_EMBEDDED' was not declared
Solution: Add embedded flag:
Selecting Discrete vs Continuous (PlatformIO)
Use PlatformIO environments to switch without renaming files:
[env:freenove_esp32_s3_wroom]
; Continuous (default)
build_src_filter = +<src/stream_ac_continuous.cpp> -<src/stream_ac_discrete.cpp>
[env:freenove_esp32_s3_wroom_discrete]
extends = freenove_esp32_s3_wroom
; Discrete
build_src_filter = +<src/stream_ac_discrete.cpp> -<src/stream_ac_continuous.cpp>
Runtime Errors
Error: Guru Meditation Error: Core 1 panic'ed (LoadProhibited)
Solution: Check memory allocation:
// Reduce model size
#define HIDDEN_SIZE 32 // Smaller hidden layer
// Check available memory
size_t free_heap = esp_get_free_heap_size();
Serial.printf("Free heap: %d bytes\n", free_heap);
Error: Out of memory
Solution: Enable PSRAM and optimize memory usage:
Error: Serial output garbled
Solution: Check baud rate and USB connection:
# Use correct baud rate
pio device monitor --baud 115200
# Check USB cable and port
ls /dev/ttyUSB* # or /dev/ttyACM*
Performance Issues
Issue: Slow inference Solution: Optimize model and compiler settings:
Issue: High memory usage Solution: Profile and optimize:
// Monitor memory usage
void print_memory_info() {
Serial.printf("Free heap: %d\n", esp_get_free_heap_size());
Serial.printf("Largest free block: %d\n", esp_get_max_alloc_heap());
Serial.printf("Minimum free heap: %d\n", esp_get_minimum_free_heap_size());
}
Debugging Tools
Serial Debugging
// Add debug prints
#define DEBUG_PRINT(x) Serial.print(x)
#define DEBUG_PRINTLN(x) Serial.println(x)
// Conditional debugging
#ifdef DEBUG_MODE
DEBUG_PRINTLN("Processing observation...");
DEBUG_PRINT("Observation shape: ");
DEBUG_PRINTLN(obs.shape()[0]);
#endif
Memory Debugging
// Enable heap debugging
#include "esp_heap_caps.h"
// Check for memory leaks
void check_memory() {
size_t free_before = esp_get_free_heap_size();
// ... your code ...
size_t free_after = esp_get_free_heap_size();
if (free_after < free_before) {
Serial.printf("Memory leak detected: %d bytes\n",
free_before - free_after);
}
}
Performance Profiling
// Profile execution time
#include "esp_timer.h"
int64_t start_time = esp_timer_get_time();
// ... your code ...
int64_t end_time = esp_timer_get_time();
Serial.printf("Execution time: %lld us\n", end_time - start_time);
Advanced Topics
Custom Environments
// Create custom environment
class CustomEnvironment {
public:
ag::Matrix get_observation() {
// Read sensors, process data
return ag::Matrix::Random(1, obs_dim);
}
void take_action(const ag::Matrix& action) {
// Execute action (servos, motors, etc.)
int action_idx = static_cast<int>(action(0, 0));
execute_action(action_idx);
}
double get_reward() {
// Calculate reward based on current state
return calculate_reward();
}
private:
void execute_action(int action) {
// Hardware-specific action execution
switch(action) {
case 0: digitalWrite(LED_PIN, HIGH); break;
case 1: digitalWrite(LED_PIN, LOW); break;
}
}
double calculate_reward() {
// Reward calculation logic
return sensor_value / 100.0;
}
};
Sensor Integration
// IMU integration example
#include "MPU6050.h"
MPU6050 mpu;
void setup_sensors() {
Wire.begin();
mpu.initialize();
if (!mpu.testConnection()) {
Serial.println("MPU6050 connection failed");
}
}
ag::Matrix read_imu_data() {
int16_t ax, ay, az, gx, gy, gz;
mpu.getMotion6(&ax, &ay, &az, &gx, &gy, &gz);
// Normalize and create observation
ag::Matrix obs(1, 6);
obs(0, 0) = ax / 16384.0; // Normalize accelerometer
obs(0, 1) = ay / 16384.0;
obs(0, 2) = az / 16384.0;
obs(0, 3) = gx / 131.0; // Normalize gyroscope
obs(0, 4) = gy / 131.0;
obs(0, 5) = gz / 131.0;
return obs;
}
Wireless Communication
// WiFi communication for remote monitoring
#include "WiFi.h"
void setup_wifi() {
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("WiFi connected");
}
void send_telemetry(const ag::Matrix& obs, double reward) {
if (WiFi.status() == WL_CONNECTED) {
// Send data to server or cloud
String data = String(obs(0, 0)) + "," + String(reward);
// HTTP POST or WebSocket implementation
}
}
OTA Updates
// Over-the-air firmware updates
#include "ArduinoOTA.h"
void setup_ota() {
ArduinoOTA.setHostname("tinyrl-esp32");
ArduinoOTA.setPassword("admin");
ArduinoOTA.begin();
}
void loop() {
ArduinoOTA.handle(); // Handle OTA updates
// ... main loop code ...
}
For more information, see the ESP32 Example README and Build & Configuration Guide.