Network initialization

So far, we have seen that there are a number of stages in a neural network model. We already know that weight exists between two nodes (of two different layers). The weights undergo a linear transformation and, along with values from input nodes, it crosses through nonlinear activation function in order to yield the value of the next layer. It gets repeated for the next and subsequent layers and later on, with the help of backpropagation, optimal values of weights are found out.

For a long time, weights used to get randomly initialized. Later on, it was realized that the way we initialize the network has a massive impact on the model. Let's see how we initialize the model:

If a network is initialized with zero, then all the hidden nodes will get zero signal because all the inputs will be multiplied by zero. Hence, no matter what the input value is, if all weights are the same, all units in the hidden layer will be the same too. This is called symmetry, and it has to be broken in order to have more information capturing a good model. Hence, the weights are supposed to be randomly initialized or with different values: