We have to figure out a way to minimize the total loss function and it can be achieved by changing the weight. It can be done by using a crude method like modifying the parameter W over a range of -500 to 500 with a step 0.001. It will help us to find a point where the sum of squares of error becomes 0 or minimum.
But this approach will work out in this scenario because we don't have too many parameters here and computation won't be too challenging. However, when we have a number of parameters, the computation would take a hit.
Here, mathematics comes to our rescue in the form of differentiation (maxima and minima approach) in order to optimize the weights. The derivative of a function at a certain point gives the rate at which this function is changing its values. Here, we would take the derivative of loss function. What it will do is to assess an impact on total error by making a slight adjustment or change in weight. For example, if we try to make a change in weight which is δW, W= W+ δW, we can find out how it is influencing loss function. Our end goal is to minimize the loss function through this.
We know that the minima will be arrived at w=2; hence, we are exploring different scenarios here:
- w<2 implies a positive loss function, negative derivative, meaning that an increase of weight will decrease the loss function
- w>2 implies positive loss function, but the derivative is positive, meaning that any more increase in the weight will increase the losses
- At w=2, loss=0 and the derivative is 0; minima is achieved: