Fig. 1From: A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer’s DiseaseWorkflow of the proposed analysis. The clinical and neuropsychological indexes (i.e., S features) are used to train a Random Forest (RF) classifier and predict the diagnosis of each subject at each visit with a leave-one-subject-out cross-validation strategy; for each cross-validation round the training set was randomly under-sampled \(U = 100\) times by selecting a fixed amount of \(N_{TRAIN}=500\) samples for each diagnostic category to handle class imbalance; the SHAP algorithm was used to explain the predictions of RF models for each sample; different statistical analyses were performed by using both probability scores resulting from RF and SHAP values to: (i) relate the performance of RF to the variability of the SHAP scores, (ii) analyze the variability of the SHAP scores between diagnostic categories, (iii) examine the longitudinal variability of the SHAP scoresBack to article page