Development and Interpretable Machine Learning-Based Prediction of Cardiovascular Disease Risk in Chinese COPD Patients: An Analysis of the CHARLS Database

Scritto il 01/06/2026

da Yalian Yuan

Int J Chron Obstruct Pulmon Dis. 2026 May 25;21:590631. doi: 10.2147/COPD.S590631. eCollection 2026.

ABSTRACT

BACKGROUND: Individuals with chronic obstructive pulmonary disease (COPD) experience a significant decline in their quality of life owing to cardiovascular disease (CVD). This study aimed to develop a predictive framework for evaluating CVD risk in patients with COPD.

PATIENTS AND METHODS: Data from 1070 COPD patients participating in the 2015 China Health and Retirement Longitudinal Study (CHARLS) were analyzed. To ensure robust feature selection, Least Absolute Shrinkage and Selection Operator (LASSO) regression and the Boruta algorithm were utilized. Subsequently, the predictive performance of six distinct Machine learning (ML) models (Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting Machine, XGBoost, and Multi-Layer Perceptron) was comprehensively compared. The Synthetic Minority Oversampling Technique-Nominal Continuous (SMOTE-NC) was applied to the training set to combat class imbalance. An interpretable risk assessment tool was developed using SHapley Additive exPlanations (SHAP).

RESULTS: 305 participants (28.50%) had CVD. Seven variables were used to build the six models. The SVM model showed comparatively better performance than the others, with a training Area Under the Receiver Operating Characteristic curve (AUROC) of 0.819 (95% Confidence Interval (CI) 0.793-0.844), accuracy of 74.42%, sensitivity of 75.56%, precision of 74.18%, specificity of 73.26%, and F1 score of 74.86%. In the test set, the AUROC was 0.719 (95% CI, 0.670-0.760), with an accuracy of 68.63%, sensitivity of 64.20%, precision of 66.53%, specificity of 64.96%, and F1 score of 69.36%.

CONCLUSION: This study identified seven key predictors-sex, body weight, hypertension, dyslipidemia, disability, self-rated health, and vision status-that are significantly associated with cardiovascular risk in Chinese patients with COPD. Among the six machine-learning algorithms evaluated, the SVM model demonstrated the most robust performance; however, its predictive capacity remains moderate, reflecting the inherent limitations of cross-sectional survey data and the reliance on self-reported diagnoses. Future prospective studies and rigorous external validation in independent cohorts are essential to refine these predictors and translate this machine-learning approach into reliable clinical decision-support systems for the personalized management of COPD patients.

PMID:42220548 | PMC:PMC13221435 | DOI:10.2147/COPD.S590631