Some data exploration

We have 1,288 rows by 1,850 columns. To do some brief exploratory analysis, we can plot one of the images by using this code:

# plot one of the faces
plt.imshow(X[0].reshape((h, w)), cmap=plt.cm.gray)
lfw_people.target_names[y[0]]

This will give us the following label:

'Hugo Chavez'

The image is as follows:

Now, let's plot the same image after applying a scaling module, as follows:

plt.imshow(StandardScaler().fit_transform(X)[0].reshape((h, w)), cmap=plt.cm.gray)
lfw_people.target_names[y[0]]

Which gives us this output:

'Hugo Chavez'

We get the following image for the preceding code:

Here, you can see that the image is slightly different, with darker pixels around the face. Now, let's set up the label to predict:

# the label to predict is the id of the person
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

print "Total dataset size:"
print "n_samples: %d" % n_samples
print "n_features: %d" % n_features
print "n_classes: %d" % n_classes

This gives us the following output:

Total dataset size:
n_samples: 1288
n_features: 1850
n_classes: 7