We spent a few chapters building a neural network, and a few more investigating its finer points. In this chapter, we’ll come down to the wire and shoot for 99% accuracy on MNIST. To get there, we’ll follow an iterative process that is the ML equivalent of software development. In fact, you can call it just that: development.
Like software development, ML development is too broad an activity to fit in this chapter—or this book. It involves people with different skills, from mathematicians to engineers. Even the engineering part of the job is vaster than just “build a neural network and tune it”: real-life ML systems are often complicated pipelines composed of multiple algorithms and services. To make things harder, ML development is often an art as well as a science: it requires plenty of experience, educated guesses, and plain old luck.
As the saying goes, however, “the harder you practice, the luckier you get”—so, let’s start practicing. We’ll look at ML development in a nutshell, focusing on three activities:
We already decided to crack our problem with a neural network. We’ll start by preparing our dataset for that network. For example, we’ll rescale the input variables to make them more network-friendly.
Then we’ll move into the development cycle. At each step, we’ll improve the network’s accuracy by tuning its hyperparameters: lr, the batch size, and so on.
At the end of the process, we’ll put the network to a final test.
Along the way, remember the testing strategy from Chapter 14, The Zen of Testing. We have three sets of examples: training, validation, and test. We’ll put the test set under a rock right now and ignore it until the final test at the end of the process. Instead, during the development cycle, we’ll use the validation set to measure the network’s performance. In fact, the validation set is sometimes called the dev set, because it’s used during development.
Now we have a plan. Let’s jump in and see how close we can get to that 99%.