Real-Time Deep RL That Fits in Small Devices

Purpose-built for streaming deep reinforcement learning on resource-constrained devices like microcontrollers. Zero dependencies, real-time learning, production-ready.

Star on GitHub Read the Docs →

⚡ Header-only 🧠 Autograd 🎯 Streaming RL 🪶 Embedded-ready

main.cpp

// TinyRL minimal MLP example

int batch_size = 1;
nn::Sequential model;
model.add(nn::Linear(10, 64));
model.add(nn::LayerNorm());
model.add(nn::ReLU());
model.add(nn::Linear(64, 1));

ag::SGD opt(0.01f);
opt.add_parameters(model.layers());

ag::Tensor x(ag::Matrix::Random(batch_size,10));
ag::Tensor y(ag::Matrix::Random(batch_size,1));

auto pred = model.forward(x);
auto loss = ag::sum(ag::pow(pred - y, 2.0f)) / batch_size;
opt.zero_grad();
loss.backward();
opt.step();

Built Different

All ML frameworks are built for batch learning that assume abundant compute and memory. TinyRL assumes neither.

⚡ Header-Only

Zero dependencies. Include and compile.

🧠 Autograd

Dynamic graphs with reverse-mode autodiff.

🎯 Streaming RL

Learn from each experience. No replay buffers.

🔬 Embedded

Train on tiny devices with real-time guarantees.

Code Examples

Autograd

// Forward + backward
ag::Tensor a(ag::Matrix::Random(4,4));
ag::Tensor b(ag::Matrix::Random(4,4));
auto c = ag::sum(ag::pow(a * b, 2.f));
c.backward();

Deep Learning Layers

nn::Sequential net;
net.add(nn::Linear(4,8));
net.add(nn::ReLU());
net.add(nn::Linear(8,2));

ag::Tensor x(ag::Matrix::Random(16,4));
auto y = net.forward(x);

Streaming Deep RL Loop

Env env;
StreamingAgent agent(cfg);
while(env.running()) {
  auto s = env.state();
  auto a = agent.act(s);
  auto [s_prime, r, done] = env.step(a);
  agent.update(s,a,r,s_prime,done);
}

Explore more in examples/ on GitHub →

Learn from experience in real-time

Traditional deep RL agents rely on storing millions of transitions and performing costly batch updates, an approach that is impractical on resource-constrained devices. In contrast, streaming deep RL agents learn as data arrives, one transition at a time. TinyRL, built around the stream-x algorithms, enables agents on small devices to adapt in real time and learn from new data immediately. → Read the paper

🌊 One-Shot

Use each transition once. No storage needed.

🛡️ Stable

Normalization prevents divergence.

⚡ O(1)

Fixed memory and compute per step.

🔁 Extensible

Works with any TD method.

Explore streaming algorithms →

Same Code, Any Device

Write once, deploy anywhere—from microcontrollers (e.g., ESP32) to single-board computers (e.g., Raspberry Pi).

🔬 Bare Metal

Microcontrollers. ESP32, STM32, Arduino, etc.

Minimal memory footprint
Optimized learning algorithms

📱 Edge Linux

SoC. Raspberry Pi, Jetson, etc.

On-device real-time learning
Privacy-preserving where no data is stored

View embedded examples →

Start Building in 5 Minutes

Clone the repo, include the header, run an example. That's it. MIT-licensed, well-tested, and actively maintained.

Get Started Documentation →