Back to notes Modules December 29, 2025 126 words

CNN

Convolutional Neural Networks

A typical CNN consists of three stages:

  1. Apply convolution in parallel to get linear activations
  2. Apply a non-linear activation function (such as ReLU), called detector stage
  3. Apply a pooling function

Convolution

TODO

Pooling

Replaces output of the network with a summary statistic of nearby outputs.

  • Max pooling: Select maximum output among neighbors
  • Average pooling: Take average of neighbors
  • L2L^2 norm pooling: Take L2L^2 norm of neighbors
  • Weighted average: Take average based on distance to central pixel

The goal is to make output invariant to small changes in the input.

torch.nn.Conv2d

It takes in_channels, out_channels and kernel_size (H,W) and creates the kernel with size [out_ch, in_ch, ker_h, ker_w], this way it keeps a different kernel for each in-out channel combination.

Out channels allow us to track different features