Skip to main content

Table 3 Hyper-parameters used to train the deep neural network

From: Automatic identification of scientific publications describing digital reconstructions of neural morphology

Hyperparameter

Applied Value

Hidden units

64

Hidden layers

2

Optimization algorithm

Stochastic Gradient Descent:

 

(1) \(\overset{\wedge }{y} = \frac{1}{m}\nabla _\theta \displaystyle \sum _{i=1}^{m} L(x^{i};\theta ;y^i)\)

 

(2) \(\theta = \theta - \alpha \overset{\wedge }{y}\)

 

Learning rate (\(\alpha \)): 0.01

 

Mini-batch size (m): 8

 

Loss Function (L) : \(Mean\ Square\ Error = (y-\overset{\wedge }{y})^2 \)

Activation functions (g)

\(Hidden\ layers: \ ReLU = max(0, z)\)

 

\(Output\ layer: \sigma = \frac{1}{1+e^{-z}}\)

Epochs

Number of training iterations over the data set: 30