Machine learning-based model for predicting recanalization in isolated distal deep vein thrombosis and analysis of predictors

Scritto il 08/05/2026

da Yingjie Kuang

PLoS One. 2026 May 8;21(5):e0349110. doi: 10.1371/journal.pone.0349110. eCollection 2026.

ABSTRACT

BACKGROUND: Isolated distal deep vein thrombosis (IDDVT) is common, yet tools for predicting poor recanalization remain limited. We aimed to develop and compare machine learning models for predicting poor recanalization in patients with IDDVT and to identify the most informative predictors.

METHODS: A total of 1600 patients with IDDVT were retrospectively enrolled. The dataset was randomly divided into a development set (n = 1280) and an independent test set (n = 320) using stratified sampling. Six predictive models were developed and compared: logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and a Voting Ensemble. Model training and hyperparameter tuning were performed in the development set using five-fold stratified cross-validation, and optimal classification thresholds were determined using the Youden index. Model performance was evaluated by discrimination, calibration, and classification metrics, with 95% confidence intervals estimated by bootstrap resampling (10,000 iterations). SHAP analysis was applied to interpret the final model.

RESULTS: In the independent test set, all models showed acceptable to strong discrimination, with AUC values ranging from 0.808 to 0.908. XGBoost achieved the best overall performance, with an optimal threshold of 0.183, an AUC of 0.908 (95% CI, 0.855-0.952), a Brier score of 0.077 (95% CI, 0.058-0.096), an accuracy of 0.900 (95% CI, 0.866-0.931), a precision of 0.650 (95% CI, 0.529-0.767), a recall of 0.803 (95% CI, 0.686-0.906), an F1-score of 0.717 (95% CI, 0.615-0.806), and a specificity of 0.918 (95% CI, 0.884-0.950). The calibration intercept and slope of the XGBoost model were 0.149 (95% CI, -0.192 to 0.454) and 1.410 (95% CI, 1.098-1.809), respectively, indicating acceptable overall calibration. SHAP analysis identified D-dimer rate, provoking-factor-related variables, anticoagulant use, and age group as the most influential predictors.

CONCLUSION: Among six candidate models, XGBoost showed the best overall performance for predicting poor recanalization in patients with IDDVT. This study establishes an interpretable machine learning-based prediction framework focused specifically on poor recanalization in IDDVT and highlights the contribution of dynamic laboratory information, particularly D-dimer rate. The model may support early risk stratification and individualized follow-up planning, but external validation is required before routine clinical implementation.

PMID:42102041 | DOI:10.1371/journal.pone.0349110