Hyperparameter | Search space | Best assignment |
---|---|---|
Number of epochs | 50 | 50 |
Batch size | 64 | 64 |
Gradient norm | Uniform-float [5, 10] | 8.0 |
Embedding dropout | Uniform-float [0, 0.5] | 0.3 |
Number of pre-encode feedforward layers | Choice [1, 2, 3] | 3 |
Number of pre-encode feedforward hidden dims | Uniform-integer [64, 512] | 232 |
Pre-encode feedforward activation | Choice [relu, tanh] | tanh |
Pre-encode feedforward dropout | Uniform-float [0, 0.5] | 0.0 |
Encoder hidden size | Uniform-integer [64, 512] | 93 |
Number of encoder layers | Choice [1, 2, 3] | 2 |
Integrator hidden size | Uniform-integer [64, 512] | 337 |
Number of integrator layers | Choice [1, 2, 3] | 3 |
integrator dropout | Uniform-float [0, 0.5] | 0.1 |
Number of output layers | Choice [1, 2, 3] | 3 |
Output hidden size | Uniform-integer [64, 512] | 384 |
Output dropout | Uniform-float [0, 0.5] | 0.2 |
Output pool sizes | Uniform-integer [3, 7] | 6 |
Learning rate optimizer | Adam | Adam |
Learning rate | Loguniform-floa t[ 1e−6, 1e−1] | 0.0001 |
Learning rate scheduler | Reduce on plateau | Reduce on plateau |
Learning rate scheduler reduction factor | 0.5 | 0.5 |