RNN — Architecture & Mathematics

Notebook excerpts

A plain-text scan of every section in this note — the interactive, fully-styled version is in the reader above. Use whichever helps.

01
1. Architectural Taxonomy
Recurrent Neural Networks are partitioned into structural archetypes based on how temporal sequences are ingested and how outputs are projected.
02
2. Data Representation: From Text to State-Compatible Tensors
Text strings are discrete symbolic tokens. The RNN requires continuous fixed-size vectors at each timestep. The ingestion pipeline converts symbols into a temporal tensor.
03
3. Deep Dive: Forward Operations
At every timestep t , the RNN cell performs a single recurrent computation combining new input with accumulated history.
04
4. Backpropagation Through Time (BPTT)
Training requires ∂ L/∂ W for all shared weight matrices. Because W_h participates at every timestep, its gradient accumulates contributions from the entire temporal chain.
05
5. Gradient Pathology: Why Vanilla RNNs Fail
The fate of training hinges on the norm of the Jacobian product ∂ h_t/∂ h_k . Take the spectral norm lVert·rVert_2 (largest singular value) and use sub-multiplicativity: