Clinically-applicable prediction of hospital stay and patient similarity retrieval in paediatric cardiology using machine learning

Scritto il 13/05/2026

da Louise Rigny

Nat Commun. 2026 May 13. doi: 10.1038/s41467-026-73021-3. Online ahead of print.

ABSTRACT

Paediatric cardiology presents challenges due to the rarity and complexity of conditions like congenital heart disease. Using retrospective electronic healthcare records from 1,522 Great Ormond Street Hospital cases, we benchmark machine learning models to predict length of stay and retrieve similar patient cases. BioClinical-BERT is used for embedding-based retrieval, while a Random Forest model achieves the best length of stay prediction accuracy (0.88 ± 0.02), outperforming clinicians. The Random Forest model shows mean precision and sensitivity of 0.77 ± 0.03 and 0.76 ± 0.04 during NHS silent deployment (1,052 admissions). K-means identifies three clinically distinct subgroups. Cosine similarity retrieval reveals diagnosis-driven top matches, while complications dominate broader sets. In a 25-case intensive care unit pilot, clinician-rated utility improves from 4.23 ± 2.42 to 4.41 ± 2.16 (scale 0-10). Our models surpass clinician performance in length of stay prediction and show promise for case retrieval, supporting data-driven decision-making in paediatric cardiology and beyond.

PMID:42129161 | DOI:10.1038/s41467-026-73021-3