Deep Learning- Backpropogation from scratch- mathematical intuition

4 min readJan 20, 2021

We will discuss here about the mathematics and implementation of a simple neural network. Back propagation is the most important part of the neural network. During this process we calculate the difference between predicted and actual output and use Gradient Descent to update the weight values ,and we do it for many epochs until the difference between the predicted and actual output is minimized.

We will go through the following in this series:

Define a simple 2 -layer neural network :

a) Number of features.
b) Number of layers & activation functions.
c) Cost function.

2. Define the algorithm :

a) Initializing the neural network weights.
b)Forward propagation .
c) Calculation of Squared Loss.
d) Backward propagation and calculating the weights.
e)Updation of weights.

Architecture of our neural network :

Picture above depicts a simple neural network with an “input layer”,a “hidden layer “ and a “output layer”. We considered 4 features here, its depicted by — i1,i2,i3,i4 . A hidden layer with 3 neurons and output of those neurons are depicted “O21”-first neuron output of 2nd layer, “O22”- second neuron output of 2nd layer and “O23”- third neuron output of 2nd layer. The last layer is the output layer with One neuron only ,we will call it “OutO1”.

“OutO1” will finally turned up into the predicted value after epochs are run. As the network goes thorough 1 complete forward and backward propagation,it is understood that 1 epoch is completed.

Activation functions:

In our neural network the activation function we chose is a sigmoid function.The output of the sigmoid function is given between 0 & 1.

Its generally used in models where probability is the output,as it also ranges between 0 & 1.

Initializing neural network weights: Step 1

The weights in our Neural network is as follows :

Input layer weights:

W¹ 11,W¹ 12,W¹ 13 weights pertains to 1st feature i1 to 1st neuron ,2nd neuron and 3rd neuron in the second layer respectively.

W¹ 21,W¹ 22,W¹ 23 weights pertains to 2nd feature i2 to 1st neuron ,2nd neuron and 3rd neuron in the second layer respectively.

W¹ 31,W¹ 32,W¹ 33 weights pertains to 3rd feature i3 to 1st neuron ,2nd neuron and 3rd neuron in the second layer respectively.

W¹ 41,W¹ 42,W¹ 43 weights pertains to 4th feature i4 to 1st neuron ,2nd neuron and 3rd neuron in the second layer respectively.

Hidden layer weights:

W²11,W² 21,W² 31 weights pertains to 1st neuron ,2nd neuron and 3rd neuron in the second layer respectively.

In general these weights in different layers could be initialized by various ways. For our use case , we will do it randomly from a uniform distribution.

We can use the function “np.random.uniform”. to choose random weights for each layer. Another important thing is to give correct size of matrix for the weights. For example “input layer weights” like W¹ 11,W¹ 12,W¹ 13 …should be of size 4 X 3 matrix i.e (input layer neurons) X (hidden layer neurons). Additionally weights in “hidden layer weights” like W² 11,W² 21,W² 31.should be of size 3X1 matrix i.e (hidden layer neurons) X (output layer neuron).

Forward Propagation: Step 2

Calculation of loss: Step 3

The calculation of loss here is given by Squared loss:

Backward Propagation: Step 4

For Backward propagation we will focus on the updation of weights after calculating the loss. In our calculation henceforth we will only update the weights W² 11 & W¹ 11 for our demonstration purpose only.

Calculation of weight W² 11 :