CNN — Architecture & Mathematics

Notebook excerpts

A plain-text scan of every section in this note — the interactive, fully-styled version is in the reader above. Use whichever helps.

01
1. Architectural Taxonomy
Modern CNNs are partitioned into structural archetypes depending on how spatial features are extracted, how depth is managed, and how computational cost is controlled.
02
2. Data Representation: From Pixels to GPU Tensors
Raw image files (JPEG/PNG) are opaque binary blobs. They must be decoded and transformed into structured floating-point tensors before any mathematical operation can execute on hardware.
03
3. Deep Dive: The Convolution Operation
The convolution layer is the feature-extraction engine of CNNs. It slides small learned filters across the spatial extent of the input, computing local weighted sums at every position.
04
4. Nonlinearity: The Activation Function
After every convolution produces a linear output tensor, a pointwise nonlinear function is applied element-by-element.
05
5. Normalization: Batch Normalization
After Conv + ReLU, activation distributions drift across layers and iterations (internal covariate shift). BatchNorm stabilizes them. Statistics are computed per channel , pooled over the batch and both spatial axes.
06
6. Spatial Downsampling & Efficiency
As features become more abstract, full spatial resolution becomes redundant and expensive. Downsampling shrinks the grid while growing the effective receptive field.
07
7. Classifier Head: From Feature Maps to Decisions
After the final conv block, the network holds a feature tensor (B, C_final, H_final, W_final) that must collapse into a class prediction.