For more information on tuning hyper-parameters, see Bengio, Y. (2012), particularly Section 3, Hyperparameters, which discusses the selection and characteristics of various hyper-parameters. Aside from manual trial and error, two other approaches for improving hyper-parameters are grid searches and random searches. In a grid search, several values for hyper-parameters are specified and all possible combinations are tried. This is perhaps easiest to see. In R, we can use the expand.grid() function to create all possible combinations of variables:
expand.grid(
layers=c(1,4),
lr=c(0.01,0.1,0.5,1.0),
l1=c(0.1,0.5))
layers lr l1
1 1 0.01 0.1
2 4 0.01 0.1
3 1 0.10 0.1
4 4 0.10 0.1
5 1 0.50 0.1
6 4 0.50 0.1
7 1 1.00 0.1
8 4 1.00 0.1
9 1 0.01 0.5
10 4 0.01 0.5
11 1 0.10 0.5
12 4 0.10 0.5
13 1 0.50 0.5
14 4 0.50 0.5
15 1 1.00 0.5
16 4 1.00 0.5
Grid searching is effective when there are only a few values for a few hyper-parameters. However, when there are many values for some or many hyper-parameters, it quickly becomes unfeasible. For example, even with only two values for each of eight hyper-parameters, there are 28 = 256 combinations, which quickly becomes computationally impracticable. Also, if the interactions between hyper-parameters and model performance are small, then using grid search is an inefficient approach.