A Comparison of Machine Learning Algorithms for Predicting Hypertension Incidence Based on Cohort Study

Scritto il 10/05/2026
da Somayeh Ghiasi

Endocrinol Diabetes Metab. 2026 May;9(3):e70199. doi: 10.1002/edm2.70199.

ABSTRACT

OBJECTIVES: This study aimed to identify key hypertension (HTN) risk factors using machine learning (ML) models to enhance prediction accuracy.

METHODS: Data from the Mashhad stroke and heart atherosclerotic disorder (MASHAD) cohort, comprising 8237 baseline normotensive subjects, was analysed over a 10-year follow-up, during which 2548 developed HTN. Five ML algorithms-K-nearest neighbours (KNN), logistic regression (LR), XGBoost (XGB), random forest (RF) and neural networks (NN)-were employed to determine the best prediction model and identify the primary factors influencing HTN development.

RESULTS: The analysis revealed that the XGBoost model was the most suitable classifier for predicting HTN, outperforming the other algorithms. It achieved the highest AUC-ROC value (0.79), accuracy (74%), precision of the negative class (86%) and recall of the positive class (74%). Although the precision of the positive class was 55%, and the recall of the negative class was 73%, the XGBoost model demonstrated acceptable performance. Additionally, the ML methods consistently identified age (0.189), copper (0.146), BMI (0.086), triglycerides (0.052), HDL (0.039), glucose (0.039) and uric acid (0.030) as the most influential risk factors, as ranked by SHAP feature importance based on the XGBoost model.

CONCLUSION: The XGBoost model effectively predicted HTN incidence over 10 years, and age, copper, BMI, triglycerides, HDL, glucose and uric acid were the most significant risk factors. These findings highlight the importance of incorporating ML models into the prediction and prevention of hypertension.

PMID:42108403 | DOI:10.1002/edm2.70199