CNN — Architecture & Mathematics

Notebook excerpts

A plain-text scan of every section in this note — the interactive, fully-styled version is in the reader above. Use whichever helps.

  1. 01

    1. Architectural Taxonomy

    Modern CNNs are partitioned into structural archetypes depending on how spatial features are extracted, how depth is managed, and how computational cost is controlled.

  2. 02

    2. Data Representation: From Pixels to GPU Tensors

    Raw image files (JPEG/PNG) are opaque binary blobs. They must be decoded and transformed into structured floating-point tensors before any mathematical operation can execute on hardware.

  3. 03

    3. Deep Dive: The Convolution Operation

    The convolution layer is the feature-extraction engine of CNNs. It slides small learned filters across the spatial extent of the input, computing local weighted sums at every position.

  4. 04

    4. Nonlinearity: The Activation Function

    After every convolution produces a linear output tensor, a pointwise nonlinear function is applied element-by-element.

  5. 05

    5. Normalization: Batch Normalization

    After Conv + ReLU, activation distributions drift across layers and iterations (internal covariate shift). BatchNorm stabilizes them. Statistics are computed per channel , pooled over the batch and both spatial axes.

  6. 06

    6. Spatial Downsampling & Efficiency

    As features become more abstract, full spatial resolution becomes redundant and expensive. Downsampling shrinks the grid while growing the effective receptive field.

  7. 07

    7. Classifier Head: From Feature Maps to Decisions

    After the final conv block, the network holds a feature tensor \((B, C_{\text{final}}, H_{\text{final}}, W_{\text{final}})\) that must collapse into a class prediction.