Skip to content

TinyRL

Automatic Differentiation

mohmdelsayed/TinyRL

Automatic Differentiation

TinyRL's Autograd core implements reverse-mode automatic differentiation (backpropagation) using a dynamic computational graph. This enables efficient gradient computation for training neural networks.

Overview

Concept	Description
Dynamic Graph	Computational graph built on-the-fly during forward pass
Reverse-mode AD	Efficient gradient computation via chain rule (backpropagation)
Gradient Tracking	Tensors with `requires_grad=true` track operations for differentiation
Memory Management	Smart pointers with manual cleanup via `clear_graph()`

How It Works

1. Forward Pass

During the forward pass, each operation: - Creates a new output tensor - Records parent tensors (inputs) - Stores a backward function for gradient computation

ag::Tensor a(ag::Matrix::Random(2, 2), true);  // requires_grad=true
ag::Tensor b(ag::Matrix::Random(2, 2), true);

// Each operation builds the graph
ag::Tensor c = a + b;      // c knows its parents are a and b
ag::Tensor d = c * a;      // d knows its parents are c and a
ag::Tensor loss = ag::sum(d);  // loss is the root of the graph

2. Backward Pass

Calling backward() on the loss: 1. Performs topological sort of the graph 2. Propagates gradients from output to inputs using the chain rule 3. Accumulates gradients in each tensor's .grad() buffer

loss.backward();  // Computes gradients for all tensors with requires_grad=true

// Access gradients
std::cout << "da: " << a.grad() << std::endl;
std::cout << "db: " << b.grad() << std::endl;

Complete Example

#include "autograd.h"

int main() {
    // Create tensors with gradient tracking
    ag::Tensor x(ag::Matrix({{1.0f, 2.0f}, {3.0f, 4.0f}}), true, "x");
    ag::Tensor y(ag::Matrix({{2.0f, 0.0f}, {1.0f, 3.0f}}), true, "y");

    // Build computation: loss = sum((x + y) * x)
    ag::Tensor sum_xy = x + y;        // Element-wise addition
    ag::Tensor product = sum_xy * x;  // Element-wise multiplication
    ag::Tensor loss = ag::sum(product);  // Reduce to scalar

    // Compute gradients
    loss.backward();

    // Gradients are now available
    // d(loss)/dx = (x + y) + x = 2x + y
    // d(loss)/dy = x
    std::cout << "x.grad(): " << x.grad() << std::endl;
    std::cout << "y.grad(): " << y.grad() << std::endl;

    return 0;
}

Supported Operations

All these operations support automatic differentiation:

Category	Operations
Arithmetic	`+`, `-`, `*`, `/` (element-wise with broadcasting)
Matrix	`matmul`, `transpose`
Activations	`relu`, `sigmoid`, `tanh`, `softmax`, `softplus`, `leaky_relu`
Math	`exp`, `log`, `sqrt`, `pow`
Reductions	`sum`, `mean`
Shape	`reshape`, `flatten`
Normalization	`layernorm`
Convolution	`conv2d`

Memory Management

// Gradients accumulate by default - zero them before each iteration
optimizer.zero_grad();  // Or manually: x.zero_grad();

// Free computational graph after backward pass
loss.backward();
loss.clear_graph();  // Releases memory used by the graph

// Detach tensor from graph (useful for target networks in RL)
ag::Tensor target = value.detach();  // No gradient flow through target

Graph Visualization

Visualize the computational graph for debugging:

#include "draw_graph.h"

ag::Tensor loss = model.forward(input);
ag::draw_graph(loss, "computation_graph.dot");

// Convert to image: dot -Tpng computation_graph.dot -o graph.png

See Also

Tensor Operations — Detailed tensor API
Neural Networks — Layer-based training
Optimizers — Parameter update algorithms
Examples — More usage patterns