Back to notes Modules December 29, 2025 126 words

CNN

Convolutional Neural Networks

A typical CNN consists of three stages:

Apply convolution in parallel to get linear activations
Apply a non-linear activation function (such as ReLU), called detector stage
Apply a pooling function

Convolution

TODO

Pooling

Replaces output of the network with a summary statistic of nearby outputs.

Max pooling: Select maximum output among neighbors
Average pooling: Take average of neighbors
$L^2$ norm pooling: Take $L^2$ norm of neighbors
Weighted average: Take average based on distance to central pixel

The goal is to make output invariant to small changes in the input.

`torch.nn.Conv2d`

It takes in_channels, out_channels and kernel_size (H,W) and creates the kernel with size [out_ch, in_ch, ker_h, ker_w], this way it keeps a different kernel for each in-out channel combination.

Out channels allow us to track different features