Machine Learning-Based Ensemble Predictive Model for Cardiovascular Disease Prevention

Scritto il 18/02/2026

da Neeraj Kumar

Int J Angiol. 2025 Jul 14;35(1):50-63. doi: 10.1055/a-2644-4444. eCollection 2026 Mar.

ABSTRACT

Cardiovascular diseases (CVDs) are a primary cause of death globally, with an increasing incidence in India. Machine learning (ML) has emerged as a viable approach for CVD prediction; however, dataset size and generalizability limit model robustness. This study aims to develop an enhanced ML prediction model for CVD detection using ensemble methods. Six datasets were considered, including 7,916 records with clinical parameters. The records were classified into Dataset 1 ( n = 3,676) and Dataset 2 ( n = 4,240) based on available features to establish a feature set. Dataset 1 underwent analysis utilizing two approaches: binary classification of target variable (0: absence of CVD, 1: presence of CVD) and multiclass classification of target variable (based on CVD severity). Likewise, Dataset 2 underwent further analysis using binary classification of target variable (risk of CVD in 10 years). Identical data preprocessing and exploratory data analysis steps were performed for both dataset groups. Subsequently, 18 ML algorithms were used to develop distinct models for both dataset groups, from which LazyPredict picked the top 10 performing models. The Voting Classifier was used to build an ensemble model to integrate the models and enhance predictive performance. In the case of Dataset 1, our framework was obtained an accuracy of 96.5% in binary classification and 85.5% in multiclass classification. Similarly, our framework achieved an accuracy of 81.18% for Dataset 2. Utilizing ensemble modeling and an extensive dataset, our framework surpasses traditional and existing ML models in predicting stability, mitigating bias and improving decision support in CVD detection.

PMID:41705083 | PMC:PMC12909081 | DOI:10.1055/a-2644-4444