The regression model

The previous section developed a deep learning model for a binary classification task, this section develops a deep learning model to predict a continuous numeric value, regression analysis. We use the same dataset that we used for the binary classification task, but we use a different target column to predict for. In that task, we wanted to predict whether a customer would return to our stores in the next 14 days. In this task, we want to predict how much a customer will spend in our stores in the next 14 days. We follow a similar process; we load and prepare our dataset by applying log transformations to the data. The code is in Chapter4/regression.R:

set.seed(42)
fileName <- "../dunnhumby/predict.csv"
dfData <- read_csv(fileName,
                    col_types = cols(
                      .default = col_double(),
                      CUST_CODE = col_character(),
                      Y_categ = col_integer())
 )
nobs <- nrow(dfData)
train <- sample(nobs, 0.9*nobs)
test <- setdiff(seq_len(nobs), train)
predictorCols <- colnames(dfData)[!(colnames(dfData) %in% c("CUST_CODE","Y_numeric","Y_numeric"))]

dfData[, c("Y_numeric",predictorCols)] <- log(0.01+dfData[, c("Y_numeric",predictorCols)])
trainData <- dfData[train, c(predictorCols,"Y_numeric")]
testData <- dfData[test, c(predictorCols,"Y_numeric")]

xtrain <- model.matrix(Y_numeric~.,trainData)
xtest <- model.matrix(Y_numeric~.,testData)

We then perform regression analysis on the data using lm to create a benchmark before creating a deep learning model:

# lm Regression Model
regModel1=lm(Y_numeric ~ .,data=trainData)
pr1 <- predict(regModel1,testData)
rmse <- sqrt(mean((exp(pr1)-exp(testData[,"Y_numeric"]$Y_numeric))^2))
print(sprintf(" Regression RMSE = %1.2f",rmse))
[1] " Regression RMSE = 29.30"
mae <- mean(abs(exp(pr1)-exp(testData[,"Y_numeric"]$Y_numeric)))
print(sprintf(" Regression MAE = %1.2f",mae))
[1] " Regression MAE = 13.89"

We output two metrics, rmse and mae, for our regression task. We covered these earlier in the chapter. Mean absolute error measures the absolute differences between the predicted value and the actual value. Root mean squared error (rmse) penalizes the square of the differences between the predicted value and the actual value, so one big error costs more than the sum of the small errors. Now let's look at the deep learning regression code. First we load the data and define the model:

require(mxnet)
Loading required package: mxnet

# MXNet expects matrices
train_X <- data.matrix(trainData[, predictorCols])
test_X <- data.matrix(testData[, predictorCols])
train_Y <- trainData$Y_numeric

set.seed(42)
# hyper-parameters
num_hidden <- c(256,128,128,64)
drop_out <- c(0.4,0.4,0.4,0.4)
wd=0.00001
lr <- 0.0002
num_epochs <- 100
activ <- "tanh"

# create our model architecture
# using the hyper-parameters defined above
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=num_hidden[1])
act1 <- mx.symbol.Activation(fc1, name="activ1", act_type=activ)
drop1 <- mx.symbol.Dropout(data=act1,p=drop_out[1])

fc2 <- mx.symbol.FullyConnected(drop1, name="fc2", num_hidden=num_hidden[2])
act2 <- mx.symbol.Activation(fc2, name="activ2", act_type=activ)
drop2 <- mx.symbol.Dropout(data=act2,p=drop_out[2])

fc3 <- mx.symbol.FullyConnected(drop2, name="fc3", num_hidden=num_hidden[3])
act3 <- mx.symbol.Activation(fc3, name="activ3", act_type=activ)
drop3 <- mx.symbol.Dropout(data=act3,p=drop_out[3])

fc4 <- mx.symbol.FullyConnected(drop3, name="fc4", num_hidden=num_hidden[4])
act4 <- mx.symbol.Activation(fc4, name="activ4", act_type=activ)
drop4 <- mx.symbol.Dropout(data=act4,p=drop_out[4])

fc5 <- mx.symbol.FullyConnected(drop4, name="fc5", num_hidden=1)
lro <- mx.symbol.LinearRegressionOutput(fc5)

Now we train the model; note that the first comment shows how to switch to using a GPU instead of a CPU:

# run on cpu, change to 'devices <- mx.gpu()'
# if you have a suitable GPU card
devices <- mx.cpu()
mx.set.seed(0)
tic <- proc.time()
# This actually trains the model
model <- mx.model.FeedForward.create(lro, X = train_X, y = train_Y,
 ctx = devices,num.round = num_epochs,
 learning.rate = lr, momentum = 0.9,
 eval.metric = mx.metric.rmse,
 initializer = mx.init.uniform(0.1),
 wd=wd,
 epoch.end.callback = mx.callback.log.train.metric(1))
print(proc.time() - tic)
 user system elapsed 
 13.90 1.82 10.50 

pr4 <- predict(model, test_X)[1,]
rmse <- sqrt(mean((exp(pr4)-exp(testData[,"Y_numeric"]$Y_numeric))^2))
print(sprintf(" Deep Learning Regression RMSE = %1.2f",rmse))
[1] " Deep Learning Regression RMSE = 28.92"
mae <- mean(abs(exp(pr4)-exp(testData[,"Y_numeric"]$Y_numeric)))
print(sprintf(" Deep Learning Regression MAE = %1.2f",mae))
[1] " Deep Learning Regression MAE = 14.33"
rm(data,fc1,act1,fc2,act2,fc3,act3,fc4,lro,model)

For regression metrics, lower is better, so our rmse metric on the deep learning model (28.92) is an improvement on the original regression model (29.30). Interestingly, the mae on the the deep learning model (14.33) is actually worse than the original regression model (13.89). Since rsme penalizes big differences between actual and predicted values more, this indicates that the errors in the deep learning model are less extreme than the regression model.