Nat Commun. 2026 Apr 27. doi: 10.1038/s41467-026-72436-2. Online ahead of print.
ABSTRACT
Electrocardiogram (ECG) has been widely used in the diagnosis of cardiovascular disease (CVD). Current deep learning methods for CVD prediction using ECG often lack generalizability and interpretability, resulting in limited performance. Here, we have developed a self-supervised Electrocardiogram Large-scale Foundation Model (ECG-LFM) through pre-training over ten million 12-lead ECGs from multiple ECG datasets. To enhance ECG representation, ECG-LFM integrates contrastive learning with masked language modeling in a self-supervised manner, enabling the model to capture both global contextual information and fine-grained patterns within ECG signals. It was fine-tuned to predict eight types of CVDs and achieved an average area under the receiver operating characteristic curve (AUROC) of 0.930 from multiple datasets, which demonstrates improved performance compared to existing methods. The important ECG-LFM derived features (EDFs) are able to represent known CVD biomarkers, indicating the high interpretability of ECG-LFM. Applications of the EDFs in genome-wide association study identified 24 significant single nucleotide polymorphisms (SNPs) (P-value < 5×10-8, LD r2 < 0.01) associated with ECG, including 8 novel findings. The genetic causal effects of EDFs on the CVDs were evaluated by Mendelian randomization, indicating 2 CVDs and 4 EDFs having causal relationships. Overall, ECG-LFM provides accurate prediction for CVDs and novel genetic insights for ECG.
PMID:42045241 | DOI:10.1038/s41467-026-72436-2

