Skip to content

Neural Networks

The nn namespace provides neural network building blocks: layers, activations, and containers. All layers integrate seamlessly with the autograd system for automatic gradient computation.

Quick Example

#include "layers.h"
#include "optimizer.h"

// Build a simple MLP
nn::Sequential model;
model.add(nn::Linear(784, 128));  // Input: 784 features
model.add(nn::ReLU());
model.add(nn::Linear(128, 10));   // Output: 10 classes

// Setup optimizer
ag::SGD optimizer(0.01f);
optimizer.add_parameters(model.layers());

// Training step
ag::Tensor output = model.forward(input);
ag::Tensor loss = compute_loss(output, target);
optimizer.zero_grad();
loss.backward();
optimizer.step();

Layer Types

Linear (Fully Connected)

Implements: \(y = xW + b\)

// Standard initialization (LeCun uniform)
nn::Linear(int input_dim, int output_dim);

// Sparse initialization (fraction of weights set to zero)
nn::Linear(int input_dim, int output_dim, float sparsity);

Example:

nn::Linear fc(784, 128);  // 784 inputs → 128 outputs
auto y = fc.forward(x);   // x: (batch, 784) → y: (batch, 128)

Conv2D (2D Convolution)

For image data with shape (N, C, H, W).

nn::Conv2D(int in_channels, int out_channels, int kernel_size, 
           int stride = 1, int padding = 0);

Example:

nn::Conv2D conv(3, 32, 3, 1, 1);  // 3→32 channels, 3×3 kernel, stride=1, pad=1
// Input: (N, 3, 32, 32) → Output: (N, 32, 32, 32)

LayerNorm

Normalizes activations across all features (no learnable parameters).

nn::LayerNorm(float eps = 1e-5f);

Example:

model.add(nn::Linear(64, 64));
model.add(nn::LayerNorm());  // Stabilizes training
model.add(nn::ReLU());

Flatten

Reshapes 4D tensors to 2D for transitioning from conv to linear layers.

nn::Flatten();  // (N, C, H, W) → (N, C*H*W)

Activation Functions

Layer Function Description
nn::ReLU \(\max(0, x)\) Standard rectifier
nn::LeakyReLU \(\max(\alpha x, x)\) Leaky rectifier (default \(\alpha=0.01\))
nn::Sigmoid \(\frac{1}{1+e^{-x}}\) Squashes to (0, 1)
nn::Tanh \(\tanh(x)\) Squashes to (-1, 1)
nn::Softmax \(\frac{e^{x_i}}{\sum e^{x_j}}\) Probability distribution over all elements
nn::Softplus \(\log(1 + e^x)\) Smooth approximation of ReLU

All activations can be used as layers or free functions:

// As layer
model.add(nn::ReLU());

// As function
auto y = ag::relu(x);


Sequential Container

nn::Sequential chains layers into a single model.

nn::Sequential model;
model.add(nn::Linear(10, 64));
model.add(nn::LayerNorm());
model.add(nn::ReLU());
model.add(nn::Linear(64, 1));

// Forward pass through all layers
ag::Tensor output = model.forward(input);

// Get all trainable parameters
auto layers = model.layers();

CNN Example

nn::Sequential cnn;

// Convolutional layers
cnn.add(nn::Conv2D(1, 32, 3, 1, 1));   // (N,1,28,28) → (N,32,28,28)
cnn.add(nn::ReLU());
cnn.add(nn::Conv2D(32, 64, 3, 2, 1));  // (N,32,28,28) → (N,64,14,14)
cnn.add(nn::ReLU());

// Flatten and classify
cnn.add(nn::Flatten());                 // (N,64,14,14) → (N,12544)
cnn.add(nn::Linear(64*14*14, 10));      // (N,12544) → (N,10)

ag::Tensor logits = cnn.forward(images);

Layer Interface

All layers inherit from nn::Layer and implement:

Method Description
forward(input) Compute output tensor
get_parameters() Return pointers to trainable tensors
has_parameters() Returns true if layer has weights
zero_grad() Zero all parameter gradients

Custom Layers

Create custom layers by inheriting from nn::Layer:

class ScaledLinear : public nn::Layer {
public:
    ScaledLinear(int in_dim, int out_dim, float scale)
        : linear(in_dim, out_dim), scale_(scale) {}

    ag::Tensor forward(const ag::Tensor& x) override {
        return linear.forward(x) * scale_;
    }

    std::vector<ag::Tensor*> get_parameters() override {
        return linear.get_parameters();
    }

    bool has_parameters() const override { return true; }

private:
    nn::Linear linear;
    float scale_;
};

See Also