Eur J Med Res. 2026 Jan 17. doi: 10.1186/s40001-025-03763-1. Online ahead of print.
ABSTRACT
BACKGROUND: Accurate individual risk assessment is crucial for guiding and improving the prevention of atherosclerotic cardiovascular disease (ASCVD). Existing prediction models are primarily derived from Western Caucasian and Chinese Han populations. Our objective is to develop and validate an interpretable machine learning (ML) model based on biomarkers for predicting coronary artery disease (CAD) risk among multi-ethnic patients in Xinjiang, China.
METHODS: This retrospective cohort study enrolled patients who underwent coronary angiography or coronary computed tomography angiography (CCTA) at the First Affiliated Hospital of Xinjiang Medical University. The cohort was divided into training, validation, and test sets. Feature selection was performed using logistic regression and LASSO, followed by prediction model development with six machine learning algorithms (XGBoost, RF, MLP, SVM, KNN, AdaBoost). Predictive performance was evaluated using the area under the receiver operating characteristic curve (AUROC) as the primary metric to identify the optimal algorithm. The selected algorithm was further validated on both the validation and testing sets. Shapley Additive Explanations (SHAP) were applied to quantify each feature's contribution to CAD risk prediction, generating individualized risk explanations. Furthermore, the model's calibration was assessed using calibration curves and the Brier score. Its clinical utility was evaluated through decision curve analysis and was benchmarked against the established SCORE2 Asia Pacific risk model.
RESULTS: This study enrolled 7655 male and 4461 female participants, divided into training, validation, and test sets in a 7:1.5:1.5 ratio. XGBoost demonstrated optimal performance in both cohorts: the male model achieved AUROCs of 0.845 (95% CI: 0.834-0.855), 0.814 (0.789-0.839), and 0.826 (0.802-0.850) in the training, validation, and test sets, respectively, while the female model attained values of 0.817 (0.802-0.832), 0.759 (0.721-0.796), and 0.786 (0.751-0.821). Male CAD risk was significantly associated with advanced age, multiple abnormal clinical indicators (elevated creatinine, total cholesterol, lipoprotein(a), etc., and decreased HDL-C), hypertension, and diabetes, with higher risk observed in Kazakh and Hui ethnicities, whereas higher education and married status served as protective factors. In females, hypertension was the strongest predictor, while elevated uric acid, systolic blood pressure, fasting blood glucose, along with histories of hypertension and diabetes increased risk; married status and higher education similarly exhibited protective effects. The prediction model demonstrated favorable clinical utility and accuracy in both cohorts, with calibration significantly enhancing predictive performance. Compared to the SCORE2 Asia Pacific risk model, our model exhibited superior discriminatory ability (male: 0.826 vs. 0.662; female: 0.786 vs. 0.720) and improved calibration.
CONCLUSIONS: Machine learning models can provide personalized and highly accurate predictions of CAD risk. The interpretability of these models facilitates the identification of modifiable risk factors in individual patients, offering valuable insights to enhance primary prevention and management of cardiovascular disease in the Xinjiang region.
PMID:41545835 | DOI:10.1186/s40001-025-03763-1

