A study of Backpropagation

Posted by Kaiyuan Chen on August 25, 2017

A study of Backpropagation

As usual, the definition of backpropagation algorithm on wikipedia is

Backpropagation is a method used in artificial neural networks(ANN) to calculate the error contribution of each neuron after a batch of data (in image recognition, multiple images) is processed.

Artificial Neural Network

To explain ANN, it is a multilayered neural network under supervised learning and find best maps of a set of input to their correct output.

Loss Function

It is just like reconstruction error, or cost function, that calculates the network output and expected output.

The input vector is propagated forward through the network, layer by layer, until it reaches the output layer. The output of the network is then compared to the desired output using loss function. The resulting error value is calculated for each of the neuron in the output layer.

Contribution of each neuron

This algorithm has two phases: propagation and weight update.


This step is to calculate a loss function and use error values to calculate gradient of loss function.

Weight Update

Known the gradient of loss function, the algorithm updates weights to minimize it.

Hadamard project

It is a binary operation looks like a ⊙ b. It is element-wise or entry-wise operation, which can take place in two m*n matrices.


Backpropagation is to calculate partial derivative of cost function w.r.t weight or w.r.t bias. The idea for backpropagation is that if one can minimize the cost function by moving the weight/bias to the opposite sign of the cost function’s derivative. Backpropagation offers a way to compute the loss function’s partial derivative of each neurons.


The equations for backpropagation are presented in 1 Inserting equations here seems to be complicated.

But I will still present a pseudo code similar to Professor Andrew Ng:

For each pair{x, y} in training set:

1. set a(1) = x
2. perform forward propagation to compute activation level a in level 2-L, where L is the total number of layers 
3. using difference of a(L) and y to compute loss function
4. compute backward 
5. update weight by accumulating partial derivatives




Backpropagation algorithm, Backpropagation intuition Andrew Ng, coursera.org