Real-Time Deep RL That Fits in Small Devices
Purpose-built for streaming deep reinforcement learning on resource-constrained devices like microcontrollers. Zero dependencies, real-time learning, production-ready.
// TinyRL minimal MLP example
int batch_size = 1;
nn::Sequential model;
model.add(nn::Linear(10, 64));
model.add(nn::LayerNorm());
model.add(nn::ReLU());
model.add(nn::Linear(64, 1));
ag::SGD opt(0.01f);
opt.add_parameters(model.layers());
ag::Tensor x(ag::Matrix::Random(batch_size,10));
ag::Tensor y(ag::Matrix::Random(batch_size,1));
auto pred = model.forward(x);
auto loss = ag::sum(ag::pow(pred - y, 2.0f)) / batch_size;
opt.zero_grad();
loss.backward();
opt.step();
Built Different
All ML frameworks are built for batch learning that assume abundant compute and memory. TinyRL assumes neither.
Header-Only
Zero dependencies. Include and compile.
Autograd
Dynamic graphs with reverse-mode autodiff.
Streaming RL
Learn from each experience. No replay buffers.
Embedded
Train on tiny devices with real-time guarantees.
Code Examples
Autograd
// Forward + backward
ag::Tensor a(ag::Matrix::Random(4,4));
ag::Tensor b(ag::Matrix::Random(4,4));
auto c = ag::sum(ag::pow(a * b, 2.f));
c.backward();
Deep Learning Layers
nn::Sequential net;
net.add(nn::Linear(4,8));
net.add(nn::ReLU());
net.add(nn::Linear(8,2));
ag::Tensor x(ag::Matrix::Random(16,4));
auto y = net.forward(x);
Streaming Deep RL Loop
Env env;
StreamingAgent agent(cfg);
while(env.running()) {
auto s = env.state();
auto a = agent.act(s);
auto [s_prime, r, done] = env.step(a);
agent.update(s,a,r,s_prime,done);
}
Learn from experience in real-time
Traditional deep RL agents rely on storing millions of transitions and performing costly batch updates, an approach that is impractical on resource-constrained devices. In contrast, streaming deep RL agents learn as data arrives, one transition at a time. TinyRL, built around the stream-x algorithms, enables agents on small devices to adapt in real time and learn from new data immediately. → Read the paper
🌊 One-Shot
Use each transition once. No storage needed.
🛡️ Stable
Normalization prevents divergence.
⚡ O(1)
Fixed memory and compute per step.
🔁 Extensible
Works with any TD method.
Start Building in 5 Minutes
Clone the repo, include the header, run an example. That's it. MIT-licensed, well-tested, and actively maintained.