Sci Rep. 2026 May 4. doi: 10.1038/s41598-026-51603-x. Online ahead of print.
ABSTRACT
Diabetes mellitus remains one of the most widespread and burdensome chronic diseases worldwide, yet invasive assays and high costs constrain early detection. Existing machine-learning studies often reduce diagnosis to a binary task and overlook the clinically important pre-diabetic stage; additionally, many deep models act as uninterpretable "black boxes". To address these gaps, we propose ProgMDD, an interpretable progressive residual network for multiclass diabetes diagnosis using routine clinical biomarkers. Employing a strict, leakage-free pipeline, LASSO-based feature selection and resampling were applied exclusively to the training set, yielding a compact, robust input panel. After comparing PCA, t-SNE, and UMAP, we selected UMAP for visualization because it optimally balances global and local structure to illustrate progressive class separation. ProgMDD integrates a progressive residual architecture with channel attention and multi-level regularization to enhance feature learning. Rigorously compared against multiple baselines, ProgMDD achieved 97.02% mean accuracy under 5-fold cross-validation, reinforced by a 97.59% accuracy on the purely original, imbalanced hold-out test set and supported by multiple ablation studies. The concordance between LASSO and SHAP rankings supports biological plausibility and model transparency. By uniting interpretable deep learning with low-cost clinical data, ProgMDD furnishes a feasible approach for early screening and risk stratification in primary care, providing a transferable methodological paradigm for other chronic-disease prediction tasks.
PMID:42082595 | DOI:10.1038/s41598-026-51603-x

