Nat Commun. 2026 May 13. doi: 10.1038/s41467-026-73021-3. Online ahead of print.
ABSTRACT
Paediatric cardiology presents challenges due to the rarity and complexity of conditions like congenital heart disease. Using retrospective electronic healthcare records from 1,522 Great Ormond Street Hospital cases, we benchmark machine learning models to predict length of stay and retrieve similar patient cases. BioClinical-BERT is used for embedding-based retrieval, while a Random Forest model achieves the best length of stay prediction accuracy (0.88 ± 0.02), outperforming clinicians. The Random Forest model shows mean precision and sensitivity of 0.77 ± 0.03 and 0.76 ± 0.04 during NHS silent deployment (1,052 admissions). K-means identifies three clinically distinct subgroups. Cosine similarity retrieval reveals diagnosis-driven top matches, while complications dominate broader sets. In a 25-case intensive care unit pilot, clinician-rated utility improves from 4.23 ± 2.42 to 4.41 ± 2.16 (scale 0-10). Our models surpass clinician performance in length of stay prediction and show promise for case retrieval, supporting data-driven decision-making in paediatric cardiology and beyond.
PMID:42129161 | DOI:10.1038/s41467-026-73021-3

