From Derivatives to Machine Learning — A Reference

Notebook excerpts

A plain-text scan of every section in this note — the interactive, fully-styled version is in the reader above. Use whichever helps.

  1. 01

    What a derivative is

    A derivative answers one question: how fast is something changing right now?

  2. 02

    The limit definition formula

    The whole expression says: take the slope between two nearby points, then shrink the gap to nothing. What's left is the slope at a single point — the derivative.

  3. 03

    Heights, rise, and run

    "Height" in the formula means exactly what you'd think: the vertical distance from the x-axis up to the point on the curve. For $f(x) = x^2$:

  4. 04

    How the formula is the derivative

    The connection between the formula and the concept "derivative" is not "they're related" — the formula is the literal definition.

  5. 05

    Slope in disguise

    You're always calculating a slope when you compute a derivative — but the slope isn't always called "slope." It gets renamed depending on what the two axes represent. The pattern is invariant:

  6. 06

    Partial differentiation

    A regular derivative is the slope of a curve. A partial derivative is the slope of a surface — but only in one direction at a time.

  7. 07

    The ML connection: weights & loss

    Partial derivatives are arguably the most important math idea in modern machine learning. Here's why.

  8. 08

    Why both w and b ?

    If $\partial L / \partial w$ tells you how to change $w$ to reduce loss, why have $b$ at all? The answer reveals why ML works the way it does.

  9. 09

    Vocabulary & the bigger picture

    Repeat for billions of training examples until the loss bottoms out. When you hear "the model is learning," what's literally happening is: the partial derivatives are telling each parameter which way to move, and the parameters are sliding downhill on the loss surface.

  10. 10

    Is w derived from x ?

    Imagine a stereo. The music signal coming in is like $x$ — whatever the radio is broadcasting, you can't change it. The knob position is like $w$ — how much you're amplifying the signal. The sound coming out is like $\hat{y} = wx$. Different songs (different $x$'s) come and go through the radio. The knob ($w$) stays where you set it until you decide to turn it. The knob isn't "derived from" the music — it's a separate thing you tune to make the output sound right across all the music that comes through.