Image Data

In the previous chapter, we prepared our Machine Learning Toolkit, where we set up Keras and Docker in order to allow us to run Jupyter Notebooks to process machine learning.

In this chapter, we're going to look into preparing image data for use with machine learning and the steps that are involved in hooking that into Keras. We're going to start by learning about the MNIST digits. These are handwritten characters in the form of images that we're effectively going to perform Optical Character Recognition (OCR) on with machine learning. Then, we're going to talk about tensors. Tensors sounds like a math word, and it is really, but as a programmer, you've seen multidimensional arrays, so you've actually already been using tensors, and I'll show you the equivalency. Afterward, we're going to turn images into tensors. Images, as you're used to seeing them on a computer, need a special form of encoding to be used with machine learning.

Then, we're going to turn to categories; in this case, we're going to use zero through nine, which are the characters of individual digits, and turn them into category labels. Finally, we're going to recap, and I'm going to show you essentially a cookbook about how to think of data when you prepare it for machine learning.