Nat Commun. 2025 Dec 12. doi: 10.1038/s41467-025-66483-4. Online ahead of print.
ABSTRACT
Reliable tools for early identification of Crohn's disease (CD) remain lacking. We analyzed 2736 plasma proteins in 39,634 UK Biobank (UKB) participants and identified 44 associated with incident CD. CD274, CHI3L1, REG1B, ITGAV, PRSS8, ITGA11, GDF15, DEFA1_DEFA1B, and IL6 ranked highest in protein importance ordering. A machine learning model based on these 9 proteins achieved high prediction for CD in a geographically distinct UKB testing cohort (n = 13,262, AUC 0.76), outperforming clinical risk models. It was externally validated in EPIC-Norfolk (n = 2944, AUC 0.73) and exhibited high discriminatory capacity for CD in the cross-sectional Southern China cohort (n = 74, AUC 0.79). In the UKB testing cohort, combining proteins with clinical data improved predictive performance (AUC 0.78) up to 16 years pre-diagnosis. In the same cohort, individuals at high risk stratified by the protein model were 4.23 times more likely to develop CD. Our findings highlight proteomics-based models as a promising approach to predict CD up to 16 years before diagnosis, offering opportunities for early screening and intervention.
PMID:41387681 | DOI:10.1038/s41467-025-66483-4

