Sci Rep. 2026 Apr 19;16(1):12751. doi: 10.1038/s41598-026-48771-1.
ABSTRACT
Diagnosis of heart failure is complex but crucial for patient outcomes and often hindered by the untapped potential of unstructured clinical notes. We introduce a novel end-to-end pipeline for heart failure diagnosis, leveraging electronic health records (EHR) and German clinical notes from 846 patients. Our pipeline synthesizes abbreviation disambiguation, translation of German clinical notes to English, medical entity linking to SNOMED-CT, and subsequent classification. The classification was performed using a Support Vector Machine (SVM) and compared against a fine-tuned medBERT.de neural baseline. We reduced the reliance on training data with zero-shot learning to address limitations with abbreviation disambiguation and entity linking approaches. Validation against benchmark datasets and cardiologists demonstrates high accuracy for real clinical use. Abbreviation disambiguation achieved an accuracy of up to 96.1%. Entity linking achieved competitive performance compared to state-of-the-art approaches on selected evaluation datasets. The SVM classification approach utilizing SNOMED-CT concepts and EHR data achieved an F1-score of 65.3%, on par with the medBERT.de neural baseline using clinical notes and EHR data. Despite challenges regarding limited language-specific resources and reference dataset availability for SNOMED-CT annotations in German, our pipeline demonstrates high potential for real-world clinical use and clinical decision support grounded in the standardized SNOMED-CT ontology.
PMID:42002595 | DOI:10.1038/s41598-026-48771-1