In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW, and db. Using these gradients, we update the values of the parameters from the last layer to the first.
In this step, we calculate the gradients of the loss function f(y, y_hat) with respect to A, W, and b called dA, dW, and db. Using these gradients, we update the values of the parameters from the last layer to the first.