Field Notes

Things I’m reading, deriving, building toward — kept here so I can find them again, and so you can read along.

Featured · May 18, 2026

Autoencoders & VAEs, Visualized

A working tour of every operation inside an autoencoder — encoder, bottleneck, decoder, distributions, the reparameterization trick, KL divergence, and the ELBO — built so each step can be watched and replayed.

Deep Learning
Generative
VAE
Representation Learning

Read ~ 35 min read

02Jul 7, 2026· 24 min
Classification & Loss Functions
One study path from a raw score to a calibrated probability, the cross-entropy loss that trains it, and the confusion-matrix / ROC metrics that judge it — the loss taught where it is used. Points-first, with the derivations one click away.
- Classification
- Loss Functions
- Cross-Entropy
Read
Fld. 02
Classification
p = softmax(z)
03May 27, 2026· 40 min
The Transformer — Architecture & Mathematics
A from-scratch derivation of the Transformer with worked numeric vectors at every step — token + positional embeddings, scaled dot-product and multi-head attention (with the softmax Jacobian and the √dₖ variance argument), residual + LayerNorm, the GELU feed-forward, causal masking, and cross-attention, traced through full encoder and decoder blocks.
- Deep Learning
- Transformers
- Attention
Read
Fld. 03
Attention
softmax(QKᵀ/√dₖ)·V
04May 27, 2026· 30 min
CNN — Architecture & Mathematics
Convolutional networks end to end — the convolution arithmetic with explicit padding, its backward pass as a flipped-kernel convolution, ReLU and BatchNorm, pooling with the receptive-field recurrence, depthwise-separable factorization, and the global-average-pool + softmax classifier head.
- Deep Learning
- Computer Vision
- CNN
Read
Fld. 04
Computer Vision
Y = W ∗ X + b
05May 27, 2026· 18 min
RNN — Architecture & Mathematics
The vanilla recurrent cell in full — the tanh state recurrence, backpropagation through time, the single-step Jacobian diag(1−h²)·Wₕ, and a rigorous spectral-norm treatment of why gradients vanish or explode, set up as the motivation for gating.
- Deep Learning
- Sequence Models
- RNN
Read
Fld. 05
Sequence Models
hₜ = tanh(aₜ)
06May 27, 2026· 20 min
LSTM — Architecture & Mathematics
Long Short-Term Memory derived from the vanishing-gradient problem — the dual state, the forget / input / output gates, the additive cell-state update, and the constant-error-carousel proof that ∂Cₜ/∂Cₜ₋₁ = diag(fₜ) keeps gradients flowing across long sequences.
- Deep Learning
- Sequence Models
- LSTM
Read
Fld. 06
Sequence Models
Cₜ = fₜ⊙Cₜ₋₁ + …
07May 27, 2026· 18 min
GRU — Architecture & Mathematics
The Gated Recurrent Unit as a leaner LSTM — update and reset gates, convex state interpolation with a boundedness proof, the full ∂hₜ/∂hₜ₋₁ expansion, and a side-by-side parameter and gradient-path comparison with the LSTM.
- Deep Learning
- Sequence Models
- GRU
Read
Fld. 07
Sequence Models
hₜ = (1−zₜ)hₜ₋₁ + …
08May 17, 2026· 22 min
Bagging & Boosting, Visualized
Ensemble learning, end to end — bootstrap aggregation, random forests, AdaBoost, gradient boosting, and the bias-variance tradeoff that makes "wisdom of crowds" mathematically true.
- Classical ML
- Ensembles
- Bias-Variance
Read
Fld. 08
Ensembles
𝔼[f̂] → f
09May 16, 2026· 18 min
Big O Complexity, Visualized
The hidden cost of every algorithm. What it really means when we say "this runs in O(n log n)" — explained with race tracks, search games, and real numbers you can compare side by side.
- Computer Science
- Algorithms
- Complexity
Read
Fld. 09
Algorithms
O(n log n)
10May 14, 2026· 30 min
LLM & RAG Evaluation, Visualized
A 39-section tour of how language models and retrieval systems are measured — perplexity, BLEU/ROUGE/METEOR, BERTScore, the LLM benchmark zoo (MMLU, HellaSwag, GSM8K, HumanEval, MT-Bench, Chatbot Arena), the IR stack (Hit@K, MRR, NDCG), and the RAG triad / RAGAS framing.
- LLM
- RAG
- Evaluation
Read
Fld. 10
Evaluation
p(answer | context)
11May 13, 2026· 30 min
Activation & Loss Functions, Visualized
The two function families that drive deep learning — ReLU through SwiGLU on the activation side, MSE through cross-entropy to DPO/PPO/KTO on the loss side. 30+ functions, 7 interactive demos.
- Deep Learning
- Activations
- Losses
Read
Fld. 11
Deep Learning
ReLU → L(ŷ, y)
12May 12, 2026· 12 min
Regression & Model Selection Metrics, Visualized
Sums of squares, R² / adjusted R², Mallow’s Cp, AIC, BIC, and model-selection strategy — the regression half of the CS 229 metrics cheat sheet, with interactive demos.
- CS 229
- Metrics
- Regression
Read
Fld. 12
Regression
R² · AIC · BIC
13May 11, 2026· 20 min
RNN, LSTM, GRU — Cells, Gates, and Data Flow
CS 229 companion 05. Interactive diagrams for the vanilla RNN cell, BPTT and the vanishing-gradient pathology, then the LSTM gates (forget / input / output, the cell-state highway) and the GRU's reset / update simplification — with a side-by-side parameter comparison.
- CS 229
- Sequence Models
- RNN/LSTM/GRU
Read
Fld. 13
Sequence Models
hₜ₋₁ → hₜ
14May 8, 2026· 25 min
From Derivatives to Machine Learning — A Reference
A field-notes walk from differentiation through gradient descent to backprop, set in serif paper typography. Built as a self-contained reference for ML foundations.
- Math
- ML Foundations
- Reference
Read
Fld. 14
Reference
d/dx → grad

Autoencoders & VAEs, Visualized

Classification & Loss Functions

The Transformer — Architecture & Mathematics

CNN — Architecture & Mathematics

RNN — Architecture & Mathematics

LSTM — Architecture & Mathematics

GRU — Architecture & Mathematics

Bagging & Boosting, Visualized

Big O Complexity, Visualized

LLM & RAG Evaluation, Visualized

Activation & Loss Functions, Visualized

Regression & Model Selection Metrics, Visualized

RNN, LSTM, GRU — Cells, Gates, and Data Flow

From Derivatives to Machine Learning — A Reference