Multimodal Transformer-Based Electrocardiogram Analysis for Cardiovascular Comorbidity Detection: Model Development and Validation Study

Scritto il 02/12/2025
da Zi Yang

JMIR Form Res. 2025 Dec 2. doi: 10.2196/80815. Online ahead of print.

ABSTRACT

BACKGROUND: Cardiovascular diseases (CVDs) remain the leading global cause of mortality, yet traditional electrocardiogram (ECG) interpretation suffers from subjective variability and limited sensitivity to complex pathologies.

OBJECTIVE: To address these challenges, we propose the Cardiovascular Multimodal Prediction Network (CaMPNet), a Transformer-based multimodal architecture that integrates raw 12-lead ECG waveforms, nine structured machine-measured the electrocardiogram (ECG) features, and demographic data (age and sex) through cross-attention fusion.

METHODS: The model was trained on 384,877 records from the MIMIC-IV-ECG database and evaluated across 12 cardiovascular disease labels. To further assess temporal robustness, a temporal external validation was performed using the most recent 10% of the data, withheld chronologically from model development.

RESULTS: On the internal test set, the model achieved a mean Area Under the Curve (AUC) of 0.845 and Area Under the Precision-Recall Curve (AUPRC) of 0.489, outperforming the ResNet-ECG baseline (AUC 0.848 but F1 0.152) and all single-modality variants. Subgroup analyses demonstrated consistent performance across demographics (male AUC 0.846 vs female 0.843; youngest quartile 0.884 vs oldest 0.811). CaMPNet retained moderate discriminative ability in temporal external validation with a mean AUC of 0.715 and AUPRC of 0.298, though performance declined due to temporal distribution shifts. Despite this, major disease categories such as atrial fibrillation, heart failure, and normal rhythm maintained high AUCs (> 0.84). Attention-based visualization revealed clinically interpretable patterns (e.g., ST-segment elevations in ST-Segment Elevation Myocardial Infarction), and ablation experiments verified the model's tolerance to missing structured inputs.

CONCLUSIONS: CaMPNet demonstrates robust and interpretable multimodal ECG-based diagnosis, offering a scalable framework for comorbidity screening and continual learning under real-world temporal dynamics.

PMID:41330869 | DOI:10.2196/80815