Sci Prog. 2026 Jan-Mar;109(1):368504261424391. doi: 10.1177/00368504261424391. Epub 2026 Mar 9.
ABSTRACT
ObjectiveThis study aims to identify the most suitable machine-learning model for early heart disease risk screening in diabetic populations.MethodsThis retrospective cohort study utilized data from the China Health and Retirement Longitudinal Study, with baseline data from 2011 and follow-up data from 2020. Using features selected by Least Absolute Shrinkage and Selection Operator (LASSO) regression, we systematically constructed 16 distinct machine-learning models. Model performance was evaluated using a comprehensive set of metrics, including the area under the receiver operating characteristic curve, F1-score, sensitivity, specificity, precision, accuracy, and balanced accuracy. To interpret the decision-making process of the best-performing model, we conducted Shapley additive explanations (SHAP) analysis.ResultsAfter the 9-year follow-up period concluding in 2020, 157 of the 819 patients with diabetes at baseline (2011) developed heart disease. From the available features, LASSO regression selected 19 core features for model construction. Among the models developed, the K-Nearest Neighbors (KNN) model demonstrated optimal performance across key metrics, achieving the highest F1 score, balanced accuracy, and precision. The SHAP analysis identified body mass index, systolic blood pressure, and waist circumference as the three most important predictive features within the diabetic cohort. The contribution patterns of these features in the KNN model align closely with clinical expertise, achieving a strong balance between predictive power and interpretability.ConclusionThis study developed a machine-learning model to predict heart disease risk in patients with diabetes. Although the model exhibited only modest predictive performance, it provides a valuable empirical foundation and clear direction for constructing more reliable and clinically useful prediction tools in this field.
PMID:41800822 | DOI:10.1177/00368504261424391