Skip to content

TinyRL Documentation

Welcome to the TinyRL documentation! This site provides comprehensive guides and API reference for the TinyRL deep learning framework.

What is TinyRL?

TinyRL is a lightweight, header-only C++17 framework designed for real-time reinforcement learning on microcontrollers and embedded systems. It consists of two primary components:

Component Description
Autograd Core Tensors, reverse-mode automatic differentiation, neural network layers, and optimizers
Stream-X Module Streaming RL algorithms: StreamAC (Actor-Critic), StreamQ (Q-learning), and StreamSARSA

The overall project goal is clarity, minimal footprint, and suitability for real-time/embedded learning.

Quick Navigation

Getting Started

Core Components

Advanced Features

Development


Table of Contents

Overview

Core Design Philosophy

TinyRL is built around these core principles:

Principle Description
Efficiency Cache-friendly algorithms and zero-copy operations
Safety RAII design and strong type checking
Flexibility Modular architecture for easy extension
Performance Optimized for both training and inference
Educational Clean, readable code for learning

Technical Highlights

// Reshape (creates a reshaped copy)
ag::Tensor view = tensor.reshape({batch_size, feature_dim, 1, 1});

// Cache-efficient matrix multiplication
auto result = a.matmul(b);  // Uses tiled algorithm

// Backward pass (free graph after use)
loss.backward();
loss.clear_graph();

// Dynamic computational graphs
ag::draw_graph(loss, "computation_graph.dot");

Performance Optimizations

  • Block-based matrix operations for cache efficiency
  • SIMD-ready data structures for vectorized operations
  • Smart memory reuse to minimize allocations
  • Efficient broadcasting implementation for shape compatibility

Features

πŸ”§ Automatic Differentiation Engine

  • Dynamic computational graph construction with automatic memory management
  • Reverse-mode automatic differentiation for efficient gradient computation
  • Efficient memory management with shared pointers and RAII
  • Graph visualization tools for debugging and understanding

πŸ“Š Tensor Operations

  • N-dimensional array support (focus on 2D and 4D for efficiency)
  • Hardware-optimized matrix multiplication with tiled algorithms
  • Efficient broadcasting implementation for shape compatibility
  • Comprehensive math operations suite (element-wise, reductions, etc.)

🧠 Neural Network Components

  • Basic layers: Linear, Conv2D, LayerNorm with efficient fused operations
  • Activation functions: ReLU, LeakyReLU, Tanh, Softmax, Softplus
  • Sequential model container for easy layer composition
  • Custom layer support for extensibility

🎯 Optimizers

  • Stochastic Gradient Descent (SGD) - Core optimizer with ESP32 optimizations
  • RMSProp - Adaptive learning rates with moving average
  • Overshooting-bounded Gradient Descent (ObGD) - For online streaming RL
  • User-defined learning rate decay (examples provided)

πŸ€– Reinforcement Learning

  • Actor-Critic networks for policy gradient methods
  • Data normalization utilities for stable training
  • Reward scaling and advantage estimation
  • Streaming Deep RL algorithms for real-time learning

πŸ› οΈ Developer Tools

  • Computational graph visualization for debugging
  • Gradient checking utilities for validation
  • Comprehensive test suite with edge case coverage
  • Performance benchmarks for optimization

Installation

C++ (Header-Only)

git clone https://github.com/mohmdelsayed/TinyRL.git
cd TinyRL

# Option 1: CMake build
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j

# Option 2: Direct inclusion (header-only)
# Just include the headers in your project

Python Bindings

Build from source (no PyPI package yet; module name is autogradβ€”note: distinct from the unrelated Python-only package of the same name):

./install.sh --with-bindings

The resulting autograd.so appears in examples/python/ and is installed into your active environment if permissions allow.

Quick Start

Tensor Operations

#include "autograd.h"

// Create tensors with automatic differentiation
ag::Tensor x = ag::Tensor(ag::Matrix::Random(2, 3), true, "x");
ag::Tensor y = ag::Tensor(ag::Matrix::Random(3, 2), true, "y");

// Perform operations
auto z = x.matmul(y);        // Matrix multiplication
auto w = ag::relu(z);        // Activation function
auto loss = ag::sum(w);      // Reduction

// Compute gradients
loss.backward();

Neural Networks

#include "layers.h"
#include "optimizer.h"

// Define a simple network
nn::Sequential model;
model.add(nn::Linear(784, 128));
model.add(nn::ReLU());
model.add(nn::Linear(128, 10));

// Training utilities
ag::SGD optimizer(0.01);
optimizer.add_parameters(model.layers());

// Single training step
ag::Tensor output = model.forward(input);
ag::Tensor loss = compute_loss(output, target);
optimizer.zero_grad();
loss.backward();
optimizer.step();

C++ Example

#include "autograd.h"
#include "layers.h"
#include "optimizer.h"

// Set random seed for reproducibility
ag::manual_seed(42);

// Create tensors
ag::Matrix X_data = ag::Matrix::Random(1, 3);
ag::Tensor X(X_data, false, "x");

// Define model using proper layers
auto model = nn::Linear(3, 2);

// Forward pass
ag::Tensor output = model.forward(X);
ag::Tensor loss = ag::sum(ag::pow(output, 2.0));

// Backward pass
loss.backward();

Python Example

import autograd
import numpy as np

# Create tensors
X = autograd.Tensor(np.random.rand(1, 3), requires_grad=False)

# Define model
model = autograd.Linear(3, 2)

# Forward pass
output = model.forward(X)
loss = autograd.sum(autograd.pow(output, 2.0))

# Backward pass
loss.backward()

Advanced Features

1. Dynamic Computational Graphs

// Graph visualization
ag::draw_graph(loss, "computation_graph.dot");

// Dynamic tensor shapes
ag::Tensor adaptive = model.forward(input);  // Shape adapts automatically

2. Memory Management

// Automatic resource cleanup
{
    ag::Tensor temp = heavy_computation();
    // Resources freed when temp goes out of scope
}

// Manual cleanup in training loops
loss.clear_graph();  // Free graph memory after backward()

3. Reinforcement Learning (Stream-X)

// Build with: ./install.sh --with-stream-x
#include "stream_x/stream_ac_continuous.h"

// Create agent (model is provided via set_model)
ContinuousStreamAC agent(11, 1.0f, 0.99f, 0.8f, 2.0f, 2.0f);

nn::Sequential actor_backbone;
actor_backbone.add(nn::Linear(11, 128));
actor_backbone.add(nn::ReLU());

nn::Sequential mu_head;
mu_head.add(nn::Linear(128, 3));

nn::Sequential std_head;
std_head.add(nn::Linear(128, 3));
std_head.add(nn::Softplus());

nn::Sequential critic;
critic.add(nn::Linear(11, 128));
critic.add(nn::ReLU());
critic.add(nn::Linear(128, 1));

agent.set_model(actor_backbone, mu_head, std_head, critic);

// Training step
ag::Matrix norm_s = agent.normalize_observation(state);
ag::Tensor s(norm_s, false);
ag::Tensor action = agent.sample_action(s);
ag::Float scaled_r = agent.scale_reward(reward, done);
ag::Matrix norm_sn = agent.normalize_observation(next_state);
ag::Tensor sn(norm_sn, false);
ag::Tensor r(ag::Matrix::Constant(1,1,scaled_r), false);
agent.update(s, action, r, sn, done);

Documentation Structure

Section Description
Tensor Operations Core tensor manipulation
Autograd Automatic differentiation
Neural Networks Layers and models
Optimizers SGD, RMSProp, ObGD
Reinforcement Learning Stream-X algorithms
Build Guide Installation and configuration
ESP32 Guide Embedded development
Python Bindings Using from Python
API Reference Complete API documentation
Examples Practical code samples

Architecture Summary

TinyRL
β”œβ”€β”€ Autograd Core (installed API)
β”‚   β”œβ”€β”€ ag::Tensor        β€” Differentiable tensors
β”‚   β”œβ”€β”€ ag::Matrix        β€” Underlying data storage
β”‚   β”œβ”€β”€ nn::Linear        β€” Dense layers
β”‚   β”œβ”€β”€ nn::Conv2D        β€” Convolutional layers
β”‚   β”œβ”€β”€ nn::Sequential    β€” Model container
β”‚   β”œβ”€β”€ ag::SGD           β€” Stochastic gradient descent
β”‚   └── ag::RMSProp       β€” Adaptive learning rates
β”‚
└── Stream-X Module (optional, -DAUTOGRAD_BUILD_STREAM_X=ON)
    β”œβ”€β”€ ContinuousStreamAC  β€” Actor-Critic (continuous actions)
    β”œβ”€β”€ DiscreteStreamAC    β€” Actor-Critic (discrete actions)
    β”œβ”€β”€ StreamQ             β€” Online Q-learning
    β”œβ”€β”€ StreamSARSA         β€” On-policy SARSA
    └── ObGD                β€” Overshooting-bounded optimizer

TinyRL uses modern C++ (C++17) to create dynamic computational graphs with automatic differentiation. For questions or contributions, see CONTRIBUTING.md.