Reasoning-driven large language models in medicine: opportunities, challenges, and the road ahead

Scritto il 31/01/2026
da Xiaofei Wang

Lancet Digit Health. 2026 Jan 30:100931. doi: 10.1016/j.landig.2025.100931. Online ahead of print.

ABSTRACT

Developments in large language models (LLMs) in the past 2 years have shifted the focus from text, image, and audio generation to LLMs capable of multistep reasoning (thinking). The development of LLMs is particularly important for medicine and health care, but the translation of these models has been limited by the black-box nature of previous LLMs. New reasoning-driven LLMs incorporate chain-of-thought prompting and reveal intermediate reasoning steps, offering transparency and traceability, potentially improving the clinical adoption and utility of LLMs. In this Viewpoint, we examine four emerging reasoning-driven LLMs, namely OpenAI's o1 and o3-mini, Google's Gemini 2.0 Flash Thinking, and DeepSeek R1. We compare their methodological approaches, benchmark their performance on medical question-answering tasks, and assess their potential for clinical integration. We highlight both opportunities and challenges associated with deploying reasoning-driven LLMs. Key future considerations include real-world validation, rigorous benchmarking with ethical safeguards, and advancements in improving the efficiency and sustainability of reasoning-driven LLMs. Addressing these challenges will enable the fine-tuning of these LLMs for specific medical applications, enhancing their potential clinical decision support, patient education, medical training, and evidence synthesis.

PMID:41620322 | DOI:10.1016/j.landig.2025.100931