Turning categories into tensors

In the previous section, we looked at turning images into tensors for machine learning, and in this section, we will look at turning the output values, the categories, into tensors for machine learning.

We will cover output classes, what it means to make a discrete prediction, the concept of one-hot encoding; and then we'll visualize what one-hot encoding looks like as an image, and then we'll recap with a data preparation cookbook, which you should use to be able to deal with all kinds of image data for machine learning.

But for now, let's talk about output. When we're talking about digits, there's 0 through 9, so there's ten different classes, and not classes in the object-oriented sense, but classes in the label sense. Now, with these labels being from 0 to 9 as individual digits, the predictions we want to make need to be discrete. It won't do us any good to predict 1.5, there's no such digit character:

o to 9 predictions

So, for this, we're going to use a data transformation trick. This thing is called one-hot encoding, and it is where you take an array of label possibilities, in this case, the numbers 0 through 9, and turn them into a kind of bitmap, where each option is encoded as a column, and only one column is set to 1 (hence one-hot) for each given data sample:

One-hot encoding

Now, looking at both an input digit (here, 9), and the output bitmap, where you can see that the forth index has the ninth bit set, you can see that what we're doing in our data preparation here is having one image as an input and another image as an output. They just happen to be encoded as tensors (multidimensional arrays of floating point numbers):

Output bitmap

What we're going to be doing when we create a machine learning algorithm is have the computer learn or discover a function that transforms the one image (the digit nine) into another image (the bitmap with one bit set on the ninth column), and that is what we mean by machine learning. Remember that tensors are just multidimensional arrays, and that the x and y values are just pixels. We normalize these values, which means we get them from the range of zero to one so that they are useful in machine learning algorithms. The labels, or output classes, are just an array of values that we're going to map, and we're going to encode these with one-hot encoding, which again means only one is hot, or set to one.