JMIR Cardio. 2025 Dec 20. doi: 10.2196/82462. Online ahead of print.
ABSTRACT
BACKGROUND: Heart disease is a leading cause of morbidity and mortality worldwide. Although machine learning models can achieve strong predictive performance, their limited interpretability hampers clinical adoption. Logistic regression is transparent but is often perceived as less accurate than complex ensemble models.
OBJECTIVE: To develop an explainable logistic regression model (SHAP-LR) for heart disease risk prediction using routinely available clinical variables and to evaluate its performance across community survey data, public clinical datasets, and a hospital cohort, in comparison with machine learning models and the Framingham Risk Score (FRS).
METHODS: We used the 2015 Behavioral Risk Factor Surveillance System (BRFSS; 253,680 adults, 9.4% with self-reported heart disease) for model development. To benchmark machine learning methods, we trained baseline models on the full UCI Heart Disease dataset (n=920) and the Statlog Heart Disease dataset (n=270). The final SHAP-LR model itself was developed exclusively on BRFSS data. External validation of SHAP-LR was performed on the Cleveland subset of the UCI Heart Disease database (n=303), where SHAP-LR was benchmarked against FRS for discrimination and calibration.
RESULTS: In BRFSS, older age and cardiometabolic risk factors were strongly associated with heart disease. Across the UCI, Statlog, and BRFSS datasets, SHAP-LR achieved AUROCs of approximately 0.73, 0.64, and 0.80, with performance comparable to or slightly better than more complex tree-based models. In the external cohort, SHAP-LR showed overall similar discrimination to FRS. Apparent calibration, as judged by Brier scores and calibration plots, was more favorable for SHAP-LR in this high-prevalence hospital sample, but this likely reflects the use of class-weighted training in BRFSS and the mismatch between a prevalence model and a 10-year incidence risk score; these calibration differences should therefore be interpreted with caution. Subgroup analyses indicated that FRS achieved higher AUROC than SHAP-LR in some high-risk groups, including patients with diabetes or hypertension. In the BRFSS test set, the corrected SHAP-LR integer score defined three strata with observed event rates of approximately 1.1%, 4.1%, and 17.1%; mean predicted probabilities were approximately 9.3%, 26.2%, and 60.7%, indicating effective risk ranking but substantial overestimation of absolute risk in the low-risk group.
CONCLUSIONS: We developed and validated an explainable logistic regression model for heart disease risk prediction that balances predictive performance and transparency. By modeling age as a continuous predictor, comparing against multiple machine learning models, and using FRS as an external benchmark in a hospital cohort, SHAP-LR demonstrates a simple, interpretable framework for prevalent heart disease risk prediction in community and clinical datasets. However, FRS outperformed SHAP-LR in some high-risk strata, and raw SHAP-LR probabilities require local recalibration before being used for absolute risk estimation, particularly in low-prevalence populations. Prospective studies and additional external validations will be needed before SHAP-LR can be considered for routine individualized cardiovascular risk assessment.
PMID:41422491 | DOI:10.2196/82462

