TinyRL Documentation
Welcome to the TinyRL documentation! This site provides comprehensive guides and API reference for the TinyRL deep learning framework.
What is TinyRL?
TinyRL is a lightweight, header-only C++17 framework designed for real-time reinforcement learning on microcontrollers and embedded systems. It consists of two primary components:
| Component | Description |
|---|---|
| Autograd Core | Tensors, reverse-mode automatic differentiation, neural network layers, and optimizers |
| Stream-X Module | Streaming RL algorithms: StreamAC (Actor-Critic), StreamQ (Q-learning), and StreamSARSA |
The overall project goal is clarity, minimal footprint, and suitability for real-time/embedded learning.
Quick Navigation
Getting Started
- Installation Guide β Setup and build instructions
- Quick Start Examples β Basic usage patterns
- API Reference β Complete API documentation
Core Components
- Tensor Operations β Core tensor manipulation and operations
- Automatic Differentiation β Understanding the autograd system
- Neural Networks β Building and training neural networks
- Optimizers β Available optimization algorithms
Advanced Features
- Reinforcement Learning β Stream-X module: StreamAC algorithm, StreamQ algorithm, StreamSARSA algorithm, and ObGD optimizer
- ESP32 / Embedded β Embedded development guide
- Python Bindings β Using TinyRL from Python
Development
- Testing Guide β Running tests and debugging
Table of Contents
Overview
Core Design Philosophy
TinyRL is built around these core principles:
| Principle | Description |
|---|---|
| Efficiency | Cache-friendly algorithms and zero-copy operations |
| Safety | RAII design and strong type checking |
| Flexibility | Modular architecture for easy extension |
| Performance | Optimized for both training and inference |
| Educational | Clean, readable code for learning |
Technical Highlights
// Reshape (creates a reshaped copy)
ag::Tensor view = tensor.reshape({batch_size, feature_dim, 1, 1});
// Cache-efficient matrix multiplication
auto result = a.matmul(b); // Uses tiled algorithm
// Backward pass (free graph after use)
loss.backward();
loss.clear_graph();
// Dynamic computational graphs
ag::draw_graph(loss, "computation_graph.dot");
Performance Optimizations
- Block-based matrix operations for cache efficiency
- SIMD-ready data structures for vectorized operations
- Smart memory reuse to minimize allocations
- Efficient broadcasting implementation for shape compatibility
Features
π§ Automatic Differentiation Engine
- Dynamic computational graph construction with automatic memory management
- Reverse-mode automatic differentiation for efficient gradient computation
- Efficient memory management with shared pointers and RAII
- Graph visualization tools for debugging and understanding
π Tensor Operations
- N-dimensional array support (focus on 2D and 4D for efficiency)
- Hardware-optimized matrix multiplication with tiled algorithms
- Efficient broadcasting implementation for shape compatibility
- Comprehensive math operations suite (element-wise, reductions, etc.)
π§ Neural Network Components
- Basic layers: Linear, Conv2D, LayerNorm with efficient fused operations
- Activation functions: ReLU, LeakyReLU, Tanh, Softmax, Softplus
- Sequential model container for easy layer composition
- Custom layer support for extensibility
π― Optimizers
- Stochastic Gradient Descent (SGD) - Core optimizer with ESP32 optimizations
- RMSProp - Adaptive learning rates with moving average
- Overshooting-bounded Gradient Descent (ObGD) - For online streaming RL
- User-defined learning rate decay (examples provided)
π€ Reinforcement Learning
- Actor-Critic networks for policy gradient methods
- Data normalization utilities for stable training
- Reward scaling and advantage estimation
- Streaming Deep RL algorithms for real-time learning
π οΈ Developer Tools
- Computational graph visualization for debugging
- Gradient checking utilities for validation
- Comprehensive test suite with edge case coverage
- Performance benchmarks for optimization
Installation
C++ (Header-Only)
git clone https://github.com/mohmdelsayed/TinyRL.git
cd TinyRL
# Option 1: CMake build
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j
# Option 2: Direct inclusion (header-only)
# Just include the headers in your project
Python Bindings
Build from source (no PyPI package yet; module name is autogradβnote: distinct from the unrelated Python-only package of the same name):
The resulting autograd.so appears in examples/python/ and is installed into your active environment if permissions allow.
Quick Start
Tensor Operations
#include "autograd.h"
// Create tensors with automatic differentiation
ag::Tensor x = ag::Tensor(ag::Matrix::Random(2, 3), true, "x");
ag::Tensor y = ag::Tensor(ag::Matrix::Random(3, 2), true, "y");
// Perform operations
auto z = x.matmul(y); // Matrix multiplication
auto w = ag::relu(z); // Activation function
auto loss = ag::sum(w); // Reduction
// Compute gradients
loss.backward();
Neural Networks
#include "layers.h"
#include "optimizer.h"
// Define a simple network
nn::Sequential model;
model.add(nn::Linear(784, 128));
model.add(nn::ReLU());
model.add(nn::Linear(128, 10));
// Training utilities
ag::SGD optimizer(0.01);
optimizer.add_parameters(model.layers());
// Single training step
ag::Tensor output = model.forward(input);
ag::Tensor loss = compute_loss(output, target);
optimizer.zero_grad();
loss.backward();
optimizer.step();
C++ Example
#include "autograd.h"
#include "layers.h"
#include "optimizer.h"
// Set random seed for reproducibility
ag::manual_seed(42);
// Create tensors
ag::Matrix X_data = ag::Matrix::Random(1, 3);
ag::Tensor X(X_data, false, "x");
// Define model using proper layers
auto model = nn::Linear(3, 2);
// Forward pass
ag::Tensor output = model.forward(X);
ag::Tensor loss = ag::sum(ag::pow(output, 2.0));
// Backward pass
loss.backward();
Python Example
import autograd
import numpy as np
# Create tensors
X = autograd.Tensor(np.random.rand(1, 3), requires_grad=False)
# Define model
model = autograd.Linear(3, 2)
# Forward pass
output = model.forward(X)
loss = autograd.sum(autograd.pow(output, 2.0))
# Backward pass
loss.backward()
Advanced Features
1. Dynamic Computational Graphs
// Graph visualization
ag::draw_graph(loss, "computation_graph.dot");
// Dynamic tensor shapes
ag::Tensor adaptive = model.forward(input); // Shape adapts automatically
2. Memory Management
// Automatic resource cleanup
{
ag::Tensor temp = heavy_computation();
// Resources freed when temp goes out of scope
}
// Manual cleanup in training loops
loss.clear_graph(); // Free graph memory after backward()
3. Reinforcement Learning (Stream-X)
// Build with: ./install.sh --with-stream-x
#include "stream_x/stream_ac_continuous.h"
// Create agent (model is provided via set_model)
ContinuousStreamAC agent(11, 1.0f, 0.99f, 0.8f, 2.0f, 2.0f);
nn::Sequential actor_backbone;
actor_backbone.add(nn::Linear(11, 128));
actor_backbone.add(nn::ReLU());
nn::Sequential mu_head;
mu_head.add(nn::Linear(128, 3));
nn::Sequential std_head;
std_head.add(nn::Linear(128, 3));
std_head.add(nn::Softplus());
nn::Sequential critic;
critic.add(nn::Linear(11, 128));
critic.add(nn::ReLU());
critic.add(nn::Linear(128, 1));
agent.set_model(actor_backbone, mu_head, std_head, critic);
// Training step
ag::Matrix norm_s = agent.normalize_observation(state);
ag::Tensor s(norm_s, false);
ag::Tensor action = agent.sample_action(s);
ag::Float scaled_r = agent.scale_reward(reward, done);
ag::Matrix norm_sn = agent.normalize_observation(next_state);
ag::Tensor sn(norm_sn, false);
ag::Tensor r(ag::Matrix::Constant(1,1,scaled_r), false);
agent.update(s, action, r, sn, done);
Documentation Structure
| Section | Description |
|---|---|
| Tensor Operations | Core tensor manipulation |
| Autograd | Automatic differentiation |
| Neural Networks | Layers and models |
| Optimizers | SGD, RMSProp, ObGD |
| Reinforcement Learning | Stream-X algorithms |
| Build Guide | Installation and configuration |
| ESP32 Guide | Embedded development |
| Python Bindings | Using from Python |
| API Reference | Complete API documentation |
| Examples | Practical code samples |
Architecture Summary
TinyRL
βββ Autograd Core (installed API)
β βββ ag::Tensor β Differentiable tensors
β βββ ag::Matrix β Underlying data storage
β βββ nn::Linear β Dense layers
β βββ nn::Conv2D β Convolutional layers
β βββ nn::Sequential β Model container
β βββ ag::SGD β Stochastic gradient descent
β βββ ag::RMSProp β Adaptive learning rates
β
βββ Stream-X Module (optional, -DAUTOGRAD_BUILD_STREAM_X=ON)
βββ ContinuousStreamAC β Actor-Critic (continuous actions)
βββ DiscreteStreamAC β Actor-Critic (discrete actions)
βββ StreamQ β Online Q-learning
βββ StreamSARSA β On-policy SARSA
βββ ObGD β Overshooting-bounded optimizer
TinyRL uses modern C++ (C++17) to create dynamic computational graphs with automatic differentiation. For questions or contributions, see CONTRIBUTING.md.