If we run the program, we get this output:
| Iteration 0 => Loss: 1333.56666666666660603369 |
| Iteration 1 => Loss: 152.37148173674077611395 |
| … |
| Iteration 99999 => Loss: 6.69817817063803833122 |
| |
| Weights: [[ 2.41178207 1.23368396 -0.02689984 3.12460558]] |
| |
| A few predictions: |
| X[0] -> 45.8717 (label: 44) |
| X[1] -> 23.2502 (label: 23) |
| X[2] -> 28.5192 (label: 28) |
| X[3] -> 58.2355 (label: 60) |
| X[4] -> 42.8009 (label: 42) |
First, look at the loss. As we expected, it’s lower than the one that we got without a bias.
The weights are interesting in their own right. The first weight is actually the bias, which we turned into a regular weight with the “column of ones” trick. The remaining weights match the three input variables—reservations, temperature, and tourist density, respectively. Tourist density has a large weight, and temperature has a tiny one. That’s a hint that pizza sales are strongly impacted by tourist density, while they don’t seem to change much with the temperature.
Finally, the last few lines of output show predictions and labels for the first five examples. No prediction is more than a pizza or two off the mark. It seems that Roberto was right: upgrading to multiple variables boosted our ability to forecast pizzas.
Congratulations! You worked through the hardest chapter in this book. Let’s wrap up what you learned.