Grid search

For more information on tuning hyper-parameters, see Bengio, Y. (2012), particularly Section 3, Hyperparameters, which discusses the selection and characteristics of various hyper-parameters. Aside from manual trial and error, two other approaches for improving hyper-parameters are grid searches and random searches. In a grid search, several values for hyper-parameters are specified and all possible combinations are tried. This is perhaps easiest to see. In R, we can use the expand.grid() function to create all possible combinations of variables:

expand.grid(
 layers=c(1,4),
 lr=c(0.01,0.1,0.5,1.0),
 l1=c(0.1,0.5))
   layers    lr   l1
1       1  0.01  0.1
2       4  0.01  0.1
3       1  0.10  0.1
4       4  0.10  0.1
5       1  0.50  0.1
6       4  0.50  0.1
7       1  1.00  0.1
8       4  1.00  0.1
9       1  0.01  0.5
10      4  0.01  0.5
11      1  0.10  0.5
12      4  0.10  0.5
13      1  0.50  0.5
14      4  0.50  0.5
15      1  1.00  0.5
16      4  1.00  0.5

Grid searching is effective when there are only a few values for a few hyper-parameters. However, when there are many values for some or many hyper-parameters, it quickly becomes unfeasible. For example, even with only two values for each of eight hyper-parameters, there are 2⁸ = 256 combinations, which quickly becomes computationally impracticable. Also, if the interactions between hyper-parameters and model performance are small, then using grid search is an inefficient approach.