Backward propagation

In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW, and db. Using these gradients, we update the values of the parameters from the last layer to the first.