As usual, first we add some magic to display images inline in the Jupyter:
%matplotlib inline
We're using Pandas to handle our data:
import pandas
Please, visit the Kaggle site and download the dataset: https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge
Load the dataset into the memory:
data = pandas.read_csv("fer2013/fer2013.csv")
Dataset consists of gray scale face photos encoded as pixel intensities. 48 x 48 gives 2304 pixels for each. Every image is marked according to the emotion on the face.
data.head() emotion pixels Usage 0 0 70 80 82 72 58 58 60 63 54 58 60 48 89 115 121... Training 1 0 151 150 147 155 148 133 111 140 170 174 182 15... Training 2 2 231 212 156 164 174 138 161 173 182 200 106 38... Training 3 4 24 32 36 30 32 23 19 20 30 41 21 22 32 34 21 1... Training 4 6 4 0 0 0 0 0 0 0 0 0 0 0 3 15 23 28 48 50 58 84... Training How many faces of each class do we have? data.emotion.value_counts() 3 8989 6 6198 4 6077 2 5121 0 4953 5 4002 1 547 Name: emotion, dtype: int64
Here 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, and 6=Neutral.
Let's remove Disgust, as we have too little samples for it:
data = data[data.emotion != 1] data.loc[data.emotion > 1, "emotion"] -= 1 data.emotion.value_counts() 2 8989 5 6198 3 6077 1 5121 0 4953 4 4002 Name: emotion, dtype: int64 emotion_labels = ["Angry", "Fear", "Happy", "Sad", "Surprise", "Neutral"] num_classes = 6
This is how samples are distributed among training and test. We'll be using training to train the model and everything else will go to test set:
data.Usage.value_counts() Training 28273 PrivateTest 3534 PublicTest 3533 Name: Usage, dtype: int64
The size of images and the number of channels (depth):
from math import sqrt depth = 1 height = int(sqrt(len(data.pixels[0].split()))) width = int(height) height 48
Let's see some faces:
import numpy as np import scipy.misc from IPython.display import display for i in xrange(0, 5): array = np.mat(data.pixels[i]).reshape(48, 48) image = scipy.misc.toimage(array, cmin=0.0) display(image) print(emotion_labels[data.emotion[i]]) //Images are being shown in the notebook
Many faces have ambiguous expressions, so our neural network will have a hard time classifying them. For example, the first face looks surprised or sad, rather than angry, and the second face doesn't look angry at all. Nevertheless, this is the dataset we have. For the real application, I would recommend collecting more samples of higher resolution, and then annotating them such that every photo is annotated several times by different independent annotators. Then, remove all photos that were annotated ambiguously.