Large language model detects previously undiagnosed heart failure with preserved ejection fraction in patients with metabolic-associated fatty liver disease: A multicenter cohort study

Scritto il 31/03/2026

da Xiaodan Lu

PLOS Digit Health. 2026 Mar 31;5(3):e0001317. doi: 10.1371/journal.pdig.0001317. eCollection 2026 Mar.

ABSTRACT

Metabolic-associated fatty liver disease (MAFLD) and heart failure with preserved ejection fraction (HFpEF) share overlapping metabolic and inflammatory pathways, yet HFpEF is frequently underrecognized due to atypical symptoms and complex etiology. We aimed to evaluate a domain-tuned large language model (MedGuide-14B) for HFpEF detection from electronic health records (EHRs) among patients with MAFLD, and to compare outcomes between model-identified and clinically recognized cases. In this multicenter retrospective cohort, MedGuide-14B was fine-tuned on large-scale clinical encounters and utilized to analyze structured EHR data including demographics, comorbidities, and laboratory tests, together with free-text clinical notes. Patients were classified as clinically diagnosed HFpEF, MedGuide-identified HFpEF (defined as probability ≥0.70 based on ESC criteria), or non-HFpEF. Model performance was benchmarked against clinical diagnoses, and blinded validation was conducted for a prospectively sampled subset of MedGuide-identified cases. Outcomes included rehospitalization and mortality during follow-up. Among 24,011 patients with MAFLD, 3,049 (12.7%) had clinically diagnosed HFpEF, while MedGuide-14B additionally identified 4,226 (17.6%) previously undiagnosed cases, of which 90.4% were confirmed on blinded validation (κ = 0.85). For clinically diagnosed HFpEF, model performance achieved an AUC of 0.94, with a sensitivity of 95.0% and a specificity of 92.3%. Rehospitalization occurred in 67.2% of clinically diagnosed HFpEF, 55.6% of MedGuide-identified HFpEF, and 38.4% of non-HFpEF patients (P < 0.001). At 48 months, cumulative all-cause mortality was 18.9%, 12.3%, and 4.6%, respectively, and cardiovascular mortality was 10.8%, 5.9%, and 1.5% (log-rank P < 0.05). Applied to routine EHR data, a domain-tuned large language model substantially increased the detection of HFpEF among patients with MAFLD, identifying a sizeable and previously unrecognized subgroup at intermediate yet clinically meaningful risk. Embedding such a model into EHR workflows may enable earlier evaluation and targeted testing, although prospective validation across diverse settings is warranted.

PMID:41915720 | DOI:10.1371/journal.pdig.0001317