Hyperparameter | Applied Value |
---|---|
Hidden units | 64 |
Hidden layers | 2 |
Optimization algorithm | Stochastic Gradient Descent: |
 | (1) \(\overset{\wedge }{y} = \frac{1}{m}\nabla _\theta \displaystyle \sum _{i=1}^{m} L(x^{i};\theta ;y^i)\) |
 | (2) \(\theta = \theta - \alpha \overset{\wedge }{y}\) |
 | Learning rate (\(\alpha \)): 0.01 |
 | Mini-batch size (m): 8 |
 | Loss Function (L) : \(Mean\ Square\ Error = (y-\overset{\wedge }{y})^2 \) |
Activation functions (g) | \(Hidden\ layers: \ ReLU = max(0, z)\) |
 | \(Output\ layer: \sigma = \frac{1}{1+e^{-z}}\) |
Epochs | Number of training iterations over the data set: 30 |