An explainable hybrid framework for early detection of cardiovascular diseases using Categorical Boosting and Bees algorithm

Scritto il 13/12/2025
da Jayanta Sen

Sci Rep. 2025 Dec 13. doi: 10.1038/s41598-025-28514-4. Online ahead of print.

ABSTRACT

Cardiovascular disease (CVD) remains one of the leading causes of death worldwide, claiming millions of lives each year. The early detection of CVD enables healthcare professionals to make informed decisions about the patient's health. Machine learning (ML)- based frameworks have been extremely popular in predicting diseases. However, results generated from traditional ML models are "black-box," lacking transparency and interpretability. The objective of the present study is to develop an ML framework that detects CVD with promising accuracy and, further, provide interpretability to the generated outcomes to ensure targeted therapies. The Framingham, Massachusetts CVD dataset, which is publicly available from the Kaggle Repository, is used in this study. As part of the data pre-processing, the Random Oversampling (RO) technique is applied to overcome the data imbalance problem, followed by Pearson Correlation analysis to understand the correlation between attributes. Then, the Min-Max scaling technique is used for data normalization. The pre-processed data is fed into a hybrid ML framework incorporating the Categorical Boosting (CatBoost) and BEEs algorithms to achieve optimized CVD prediction results. The proposed Hybrid model yielded 98.04% accuracy, a Precision of 97.09%, a Recall of 98.96%, an F1-score of 98.02%, and a Specificity of 97.16%, with a total execution time of 26.6580 s. The proposed model outperformed contemporary state-of-the-art algorithms, considering most evaluation metrics. Additionally, Explainable Artificial Intelligence (XAI) techniques, such as LIME and SHAP, are implemented to identify the contribution of the most significant attributes towards the occurrence of CVD, offering valuable insights into the detection of the disease and enabling healthcare providers to make accurate and timely treatment decisions.

PMID:41390781 | DOI:10.1038/s41598-025-28514-4