Towards automatic text-based estimation of depression through symptom prediction

Brain Informatics

Table 2 Experimental results on the test set of the DAIC-WOZ data set

Model	Binary Diagnosis Eval		PHQ-8 Score Severity Eval		5-Class Severity Eval
Model	\(miF_1 \pm \sigma\)	\(maF_1 \pm \sigma\)	\(\text {MAE} \pm \sigma\)	\(ma\text {MAE} \pm \sigma\)	miF1-5c \(\pm \sigma\)	maF1-5c \(\pm \sigma\)
Binary Diagnosis	0.719 ± 0.016	0.701 ± 0.010	–	–	–	–
5-Class Severity	0.711 ± 0.026	0.683 ± 0.024	–	–	0.468 ± 0.023	0.270 ± 0.025
PHQ-8 Score Severity	0.681 ± 0.019	0.584 ± 0.024	5.03 ± 0.09	5.69 ± 0.12	0.289 ± 0.029	0.135 ± 0.014
Symptom Prediction	0.766 ± 0.023	0.739 ± 0.025	3.78 ± 0.13	4.19 ± 0.13	0.426 ± 0.014	0.270 ± 0.019
HCAN [7]	–	0.630	–	–	–	–
HAN+L [8]	–	0.700	–	–	–	–
ASP MT. DLC+DLR+EIR [25]	–	–	3.69	–	0.600	–
HCAG-T [23]	–	0.770\(\ddag\)	3.73\(\ddag\)	–	–	–
SGNN [27]	–	–	3.76	–	–	–

Top Section: results of our model and the baselines. All models were run five times with different seed values, and the average values with standard deviation are presented; miF1-5c (resp. maF1-5c) stands for the 5-class micro-averaged F1-score (resp. macro-averaged F1-score). Bottom Section: previously published results on the same DAIC-WOZ test set using only text modality; all results are given for the best model and not based on the average performance of several runs.
Bold values indicates the best results for each model
\(\ddag\) indicates that the results are given for the validation set only