We have 1,288 rows by 1,850 columns. To do some brief exploratory analysis, we can plot one of the images by using this code:
# plot one of the faces plt.imshow(X[0].reshape((h, w)), cmap=plt.cm.gray) lfw_people.target_names[y[0]]
This will give us the following label:
'Hugo Chavez'
The image is as follows:

Now, let's plot the same image after applying a scaling module, as follows:
plt.imshow(StandardScaler().fit_transform(X)[0].reshape((h, w)), cmap=plt.cm.gray) lfw_people.target_names[y[0]]
Which gives us this output:

Here, you can see that the image is slightly different, with darker pixels around the face. Now, let's set up the label to predict:
# the label to predict is the id of the person target_names = lfw_people.target_names n_classes = target_names.shape[0] print "Total dataset size:" print "n_samples: %d" % n_samples print "n_features: %d" % n_features print "n_classes: %d" % n_classes
This gives us the following output:
Total dataset size: n_samples: 1288 n_features: 1850 n_classes: 7