ChatGPT in Public Cardiovascular Healthcare: Accuracy, Limitations, and Implications

Scritto il 03/05/2026

da Lara Borja Mialarett

J Eval Clin Pract. 2026 Jun;32(4):e70465. doi: 10.1111/jep.70465.

ABSTRACT

INTRODUCTION: ChatGPT is a cost-effective artificial intelligence (AI) tool designed to facilitate virtual interactions with humans, and its application in healthcare is expanding. However, research on ChatGPT's effectiveness in public healthcare, particularly for cardiac patients, is still limited. This study aims to evaluate ChatGPT's potential in managing cardiovascular health for patients with acute or chronic cardiac conditions.

METHODS: We analyzed real medical records from 'The Cardiovascular Care' program, affiliated with a university outpatient clinic. ChatGPT's performance was evaluated in terms of its ability to analyze clinical cases, propose diagnoses, and recommend appropriate actions. We also assessed whether ChatGPT's accuracy and errors varied depending on disease severity, rarity, mortality risk, and urgency.

RESULTS: When compared to physicians' records, ChatGPT provided correct responses in 43% of diagnostic hypotheses, 5% of recommended supplementary exams, and 10% of laboratory tests. It showed significant accuracy and discernment in diagnosing conditions influenced by factors such as severity, rarity, risk of death, and urgency. However, this discernment did not extend to recommendations for supplementary exams and laboratory tests. Interestingly, while ChatGPT's responses in these areas were often only partially accurate, they tended to be more detailed, sometimes unnecessarily so, than those provided by physicians. Diagnostic hypotheses from multiple models, including ChatGPT Health, DeepSeek, Gemini Pro, Perplexity AI, and ESC Chat, were also evaluated. Performance varied across models, with ChatGPT demonstrating the highest diagnostic accuracy among those assessed, despite still producing incorrect outputs.

CONCLUSION: Although ChatGPT demonstrates some diagnostic capability, its overall reliability remains questionable, with performance at times approaching random chance. Caution is advised when considering its use in clinical decision-making.

PMID:42070276 | DOI:10.1111/jep.70465