Fig. 4From: Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processingPPV, NPV, sensitivity, and specificity of the four NLP models and ICD algorithm, with changing thresholds ranging between 0.05 and 0.95. The two dashed lines in each subfigure represent the 0.05 and 0.95 threshold bounds, respectively. TFIDF-CUI-RF represents algorithm “CUI + TF-IDF + RF”; WC-CUI-XGBoost represents algorithm “CUI + word count + XGBoost”; TFIDF-BOW-XGBoost represents algorithm “BOW + TF-IDF + XGBoost”; TFIDF-CUI-XGBoost represents algorithm “CUI + TF-IDF + XGBoost”; ICD represents the ICD-10-CA-codes in DAD algorithms, respectivelyBack to article page