Basic Calculus Refresh (Backpropagation)
This is a quick math refresh for me to reference in the future - and might be useful to you if you're learning backpropagation:
Example 1
flowchart LR
a["a<br/>data: 3<br/>grad: 4"] --> plus(("+"))
b["b<br/>data: 2<br/>grad: 4"] --> plus
plus --> c["c<br/>data: 5<br/>grad: 4"]
c --> mult(("×"))
d["d<br/>data: 4<br/>grad: 5"] --> mult
mult --> L["L<br/>data: 20<br/>grad: 1"]
gradient of L
Starting at the end, the - this is because any change in L produces exactly that same change in L.
gradient of c
Now, on to c. . We want to figure out how much a movement in c impacts the result in L.
When you take the derivative of L with respect to c, , lets assume the other value so the value would be . is now just 3 - or d.
Therefore, the gradient of c is the value of d: 4.
gradient of a
Now, to a. . Now we want to figure out how much a movement in a impacts the result in L.
First, let's figure out the local derivative of . Because this is an addition function, the value is 1. Any movement in a will have the exact same movement in c.
Now, we want to figure out:
This is effectively 4 ()* 1() = 4.
Example 2
a = 2
b = 3
d = 4
f = 2
c = a * b
e = c + d
L = e * f
Forward pass: c = 6, e = 10, L = 20.
flowchart LR
a["a<br/>data: 2<br/>grad: 6"] --> mult1(("×"))
b["b<br/>data: 3<br/>grad: 4"] --> mult1
mult1 --> c["c<br/>data: 6<br/>grad: 2"]
c --> plus(("+"))
d["d<br/>data: 4<br/>grad: 2"] --> plus
plus --> e["e<br/>data: 10<br/>grad: 2"]
e --> mult2(("×"))
f["f<br/>data: 2<br/>grad: 10"] --> mult2
mult2 --> L["L<br/>data: 20<br/>grad: 1"]
gradient of L
Same as last time, the gradient is 1.
gradient of e
= the value of f, which is 2
gradient of c
Local derivative first: = 1 because its an addition equation.
which is = 2 * 1 or 2.
gradient of a
Local derivative first: = the value of b, which is 3.
which is = 2 * 1 * 3 = 6.
Example 3
x = 5
y = x + 2
z = x * 3
L = y * z
Forward pass: y = 7, z = 15, L = 105.
flowchart LR
x["x<br/>data: 5<br/>grad: 36"] --> plus(("+"))
two["2<br/>(const)"] --> plus
plus --> y["y<br/>data: 7<br/>grad: 15"]
x --> mult1(("×"))
three["3<br/>(const)"] --> mult1
mult1 --> z["z<br/>data: 15<br/>grad: 7"]
y --> mult2(("×"))
z --> mult2
mult2 --> L["L<br/>data: 105<br/>grad: 1"]
gradient of L
gradient of y
(the value of z)
gradient of x
Because x is involved in two different paths, we need to calculate both paths and add them together"
via y:
is 1, so 15*1 = 15
via z:
= 7
= 3 (remember, this is a multiplication, not addition)
Therefore, = (7*3) + 15 = 36