Automatic Differentiation
TinyRL's Autograd core implements reverse-mode automatic differentiation (backpropagation) using a dynamic computational graph. This enables efficient gradient computation for training neural networks.
Overview
| Concept | Description |
|---|---|
| Dynamic Graph | Computational graph built on-the-fly during forward pass |
| Reverse-mode AD | Efficient gradient computation via chain rule (backpropagation) |
| Gradient Tracking | Tensors with requires_grad=true track operations for differentiation |
| Memory Management | Smart pointers with manual cleanup via clear_graph() |
How It Works
1. Forward Pass
During the forward pass, each operation: - Creates a new output tensor - Records parent tensors (inputs) - Stores a backward function for gradient computation
ag::Tensor a(ag::Matrix::Random(2, 2), true); // requires_grad=true
ag::Tensor b(ag::Matrix::Random(2, 2), true);
// Each operation builds the graph
ag::Tensor c = a + b; // c knows its parents are a and b
ag::Tensor d = c * a; // d knows its parents are c and a
ag::Tensor loss = ag::sum(d); // loss is the root of the graph
2. Backward Pass
Calling backward() on the loss:
1. Performs topological sort of the graph
2. Propagates gradients from output to inputs using the chain rule
3. Accumulates gradients in each tensor's .grad() buffer
loss.backward(); // Computes gradients for all tensors with requires_grad=true
// Access gradients
std::cout << "da: " << a.grad() << std::endl;
std::cout << "db: " << b.grad() << std::endl;
Complete Example
#include "autograd.h"
int main() {
// Create tensors with gradient tracking
ag::Tensor x(ag::Matrix({{1.0f, 2.0f}, {3.0f, 4.0f}}), true, "x");
ag::Tensor y(ag::Matrix({{2.0f, 0.0f}, {1.0f, 3.0f}}), true, "y");
// Build computation: loss = sum((x + y) * x)
ag::Tensor sum_xy = x + y; // Element-wise addition
ag::Tensor product = sum_xy * x; // Element-wise multiplication
ag::Tensor loss = ag::sum(product); // Reduce to scalar
// Compute gradients
loss.backward();
// Gradients are now available
// d(loss)/dx = (x + y) + x = 2x + y
// d(loss)/dy = x
std::cout << "x.grad(): " << x.grad() << std::endl;
std::cout << "y.grad(): " << y.grad() << std::endl;
return 0;
}
Supported Operations
All these operations support automatic differentiation:
| Category | Operations |
|---|---|
| Arithmetic | +, -, *, / (element-wise with broadcasting) |
| Matrix | matmul, transpose |
| Activations | relu, sigmoid, tanh, softmax, softplus, leaky_relu |
| Math | exp, log, sqrt, pow |
| Reductions | sum, mean |
| Shape | reshape, flatten |
| Normalization | layernorm |
| Convolution | conv2d |
Memory Management
// Gradients accumulate by default - zero them before each iteration
optimizer.zero_grad(); // Or manually: x.zero_grad();
// Free computational graph after backward pass
loss.backward();
loss.clear_graph(); // Releases memory used by the graph
// Detach tensor from graph (useful for target networks in RL)
ag::Tensor target = value.detach(); // No gradient flow through target
Graph Visualization
Visualize the computational graph for debugging:
#include "draw_graph.h"
ag::Tensor loss = model.forward(input);
ag::draw_graph(loss, "computation_graph.dot");
// Convert to image: dot -Tpng computation_graph.dot -o graph.png
See Also
- Tensor Operations — Detailed tensor API
- Neural Networks — Layer-based training
- Optimizers — Parameter update algorithms
- Examples — More usage patterns