Cardiol Ther. 2026 May 15. doi: 10.1007/s40119-026-00453-9. Online ahead of print.
ABSTRACT
INTRODUCTION: The multidisciplinary heart team (HT) remains the cornerstone of decision-making for complex cardiovascular disease. Large language models (LLMs) and other generative artificial intelligence models have recently emerged as potential decision support tools across diverse clinical settings. We sought to synthesize current evidence and quantitatively estimate concordance between LLM recommendations and HT decisions.
METHODS: A literature search was performed using PubMed, Scopus, and Web of Science for primary studies published between November 2022 and February 2026 that evaluated recommendations by LLMs against multidisciplinary HT decisions. Studies reporting overall agreement were included for quantitative pooling. Random-effects meta-analysis was performed to determine proportion of agreement.
RESULTS: Four retrospective concordance studies were included regarding decision-making in coronary revascularization and aortic valve intervention. LLM-HT concordance ranged from 65% to 82% for coronary revascularization and was 77% for aortic valve intervention. In random-effects meta-analysis, the pooled agreement between LLM recommendations and HT decisions was 0.73 (95% CI 0.60-0.83) with substantial heterogeneity. Discordance stemmed from LLM reliance on outdated trial evidence and limited transparency regarding utilized data, with misclassifications observed in cases of octogenarians with aortic stenosis. Detailed prompts generally improved accuracy and reliability of LLM recommendations.
CONCLUSION: These preliminary findings suggest LLMs may have potential as adjunctive decision support tools for multidisciplinary HTs. There remains potential for misclassification when patient-specific factors and conflicting guidelines complicate decision-making. Further prospective evaluation across diverse LLMs is essential before clinical deployment can be recommended.
PMID:42141253 | DOI:10.1007/s40119-026-00453-9