Evaluating real-world deployment of an HL7-CDA-aligned LLM for ICD-10-CM coding

Scritto il 14/04/2026

da Hong-Jie Dai

NPJ Digit Med. 2026 Apr 14. doi: 10.1038/s41746-026-02541-5. Online ahead of print.

ABSTRACT

Reliable ICD-10-CM coding remains a major operational burden in hospitals, and the real-world performance of AI systems for this task is poorly understood. We developed and deployed a modular, clinically grounded pipeline that combines principled base-model selection, redundancy-aware training, and HL7-aligned section prompts to support scalable ICD-10-CM coding across heterogeneous documentation environments. Using pairwise LLM-as-judge evaluation and Plackett-Luce ranking, BioMistral was identified as a high-performing foundation model and demonstrated consistent performance across two institutions. In a 13-week human-in-the-loop randomized controlled trial involving ten certified coding specialists, AI-assisted workflows significantly reduced coding time while maintaining accuracy. Satisfaction varied by experience, certification, and generational cohort, underscoring the importance of human factors in workflow integration. Importantly, our findings clarify that successful AI adoption operates across multiple levels-including documentation infrastructure, workflow uptake, and individual user acceptance-highlighting why model accuracy alone is insufficient to ensure real-world impact. These results provide real-world evidence that methodologically grounded, structurally informed LLM systems can achieve robust, equitable, and operationally meaningful performance in clinical documentation workflows.

PMID:41981090 | DOI:10.1038/s41746-026-02541-5