PLoS One. 2026 Apr 10;21(4):e0345693. doi: 10.1371/journal.pone.0345693. eCollection 2026.
ABSTRACT
OBJECTIVE: To explore the utility of natural language processing (NLP) and machine learning (ML) techniques to identify unsafe conditions leading to cardiovascular diagnostic errors using patient safety event (PSE) reports data.
METHODS: PSE reports from January 2016 to August 2021 from a multi-hospital healthcare system in the mid-Atlantic region of the United States were included in this study. To have the true cardiovascular diagnostic errors labels for PSE reports, each individual PSE report was manually reviewed to find clinician-reported narratives describing current definitions of cardiovascular diagnostic errors. The PSE reports which contained cardiovascular diagnostic errors-related narratives were labeled as one and zero otherwise. Four binary ML models were employed to identify cardiovascular diagnostic errors narratives and common features from annotated PSE reports data: (1) simple logistic regression, (2) elastic net, (3) XGBoost, and (4) deep neural networks.
RESULTS: XGBoost outperformed the rest of the models in identifying cardiovascular diagnostic errors -related reports and achieved high performance metrics on the testing data (AUROC = 0.914, specificity = 0.982, PPV = 0.866, accuracy = 0.929, F-1 score = 0.738, and AUPRC = 0.783). Pacemaker emerged as a significant signal for cardiovascular diagnostic errors in our PSE reports. Our analysis demonstrated that ordering MRI for patients with pacemaker was a frequent theme among cardiovascular diagnostic errors-related PSE reports containing the word pacemaker. Order, EKG, cardiac, and chest were the five most important features in identifying cardiovascular diagnostic errors events. These words were utilized in explaining heart conditions, associated care process, and related safety incidents.
CONCLUSIONS: Findings from our study demonstrates the feasibility of ML and NLP techniques in identifying cardiovascular diagnostic errors-related reports among the existing PSE data sources. However, validation in external healthcare systems is needed before broader application.
PMID:41961785 | DOI:10.1371/journal.pone.0345693