Basic Calculus Refresh (Backpropagation)

2026-04-19

This is a quick math refresh for me to reference in the future - and might be useful to you if you're learning backpropagation:

Example 1

flowchart LR
    a["a<br/>data: 3<br/>grad: 4"] --> plus(("+"))
    b["b<br/>data: 2<br/>grad: 4"] --> plus
    plus --> c["c<br/>data: 5<br/>grad: 4"]
    c --> mult(("×"))
    d["d<br/>data: 4<br/>grad: 5"] --> mult
    mult --> L["L<br/>data: 20<br/>grad: 1"]

gradient of L

Starting at the end, the dLdL=1\frac{dL}{dL} = 1 - this is because any change in L produces exactly that same change in L.

gradient of c

Now, on to c. cd=Lc*d = L. We want to figure out how much a movement in c impacts the result in L.

When you take the derivative of L with respect to c, dLdc\frac{dL}{dc}, lets assume the other value d=3d = 3 so the value would be 3c=L3c = L. dLdc\frac{dL}{dc} is now just 3 - or d.

Therefore, the gradient of c is the value of d: 4.

gradient of a

Now, to a. a+b=ca+b = c. Now we want to figure out how much a movement in a impacts the result in L.

First, let's figure out the local derivative of dcda\frac{dc}{da}. Because this is an addition function, the value is 1. Any movement in a will have the exact same movement in c.

Now, we want to figure out:

dLda=dLdcdcda\frac{dL}{da} = \frac{dL}{dc} * \frac{dc}{da}

This is effectively 4 (dLdc\frac{dL}{dc})* 1(dcda\frac{dc}{da}) = 4.

Example 2

a = 2
b = 3
d = 4
f = 2
c = a * b
e = c + d
L = e * f

Forward pass: c = 6, e = 10, L = 20.

flowchart LR
    a["a<br/>data: 2<br/>grad: 6"] --> mult1(("×"))
    b["b<br/>data: 3<br/>grad: 4"] --> mult1
    mult1 --> c["c<br/>data: 6<br/>grad: 2"]
    c --> plus(("+"))
    d["d<br/>data: 4<br/>grad: 2"] --> plus
    plus --> e["e<br/>data: 10<br/>grad: 2"]
    e --> mult2(("×"))
    f["f<br/>data: 2<br/>grad: 10"] --> mult2
    mult2 --> L["L<br/>data: 20<br/>grad: 1"]

gradient of L

Same as last time, the gradient is 1.

gradient of e

dLde\frac{dL}{de} = the value of f, which is 2

gradient of c

Local derivative first: dedc\frac{de}{dc} = 1 because its an addition equation.

dLdc=dLdededc\frac{dL}{dc} = \frac{dL}{de} * \frac{de}{dc}

which is = 2 * 1 or 2.

gradient of a

Local derivative first: dcda\frac{dc}{da} = the value of b, which is 3.

dLda=dLdededcdcda\frac{dL}{da} = \frac{dL}{de} * \frac{de}{dc} * \frac{dc}{da}

which is = 2 * 1 * 3 = 6.

Example 3

x = 5
y = x + 2
z = x * 3
L = y * z

Forward pass: y = 7, z = 15, L = 105.

flowchart LR
    x["x<br/>data: 5<br/>grad: 36"] --> plus(("+"))
    two["2<br/>(const)"] --> plus
    plus --> y["y<br/>data: 7<br/>grad: 15"]
    x --> mult1(("×"))
    three["3<br/>(const)"] --> mult1
    mult1 --> z["z<br/>data: 15<br/>grad: 7"]
    y --> mult2(("×"))
    z --> mult2
    mult2 --> L["L<br/>data: 105<br/>grad: 1"]

gradient of L

dLdL=1\frac{dL}{dL} = 1

gradient of y

dLdy=15\frac{dL}{dy} = 15 (the value of z)

gradient of x

Because x is involved in two different paths, we need to calculate both paths and add them together"

via y:

dLdx=dLdydydx\frac{dL}{dx} = \frac{dL}{dy} * \frac{dy}{dx}

dydx\frac{dy}{dx} is 1, so 15*1 = 15

via z:

dLdx=dLdzdzdx\frac{dL}{dx} = \frac{dL}{dz} * \frac{dz}{dx}

dLdz\frac{dL}{dz} = 7

dzdx\frac{dz}{dx} = 3 (remember, this is a multiplication, not addition)

Therefore, dLdx\frac{dL}{dx} = (7*3) + 15 = 36