PLOS Digit Health. 2026 Jul 2;5(7):e0001528. doi: 10.1371/journal.pdig.0001528. eCollection 2026 Jul.
ABSTRACT
Cardiovascular Metabolic Comorbidities (CMM) share common physiological mechanisms in inflammation and immunity, oxidative stress, and insulin resistance, leading to mutual disease interactions and complex clinical manifestations. To address challenges in describing CMM status based solely on clinical features, this paper proposes a digital marker to characterize the differences from single to multiple diseases, systematically revealing distinct CMM subgroups based on cross-sectional data. This paper constructed a directed acyclic network for CMM using demographic characteristics, clinical laboratory parameters, and disease status as nodes via the DirectLiNGAM algorithm. Network features were described using in degree, out degree, degree centrality, betweenness centrality, and closeness centrality, ranking node importance. The top seven significant clinical laboratory parameters were selected based on this ranking. Subsequently, the performance of ten machine learning algorithms (Random Forest, XGBoost, MLP, KNN, Gradient Boosting, SVC, Linear Regression, Ridge, ElasticNet, Lasso) in generating digital markers by predicting death was evaluated to determine the optimal algorithm. The generated digital markers were then binned to classify CMM into Low, Middle, and High groups. Finally, linear regression validated the rationality of the network filtered clinical laboratory parameters. In the CMM network, the top three disease nodes by in degree are DM, MemD, and DL, while the top five by out degree are TC, HBALC, GLU, HCT, and HGB. Regarding network centrality, the top five nodes by degree centrality are Male, TG, DM, CYC, and DL; by betweenness centrality, Male, Stroke, TG, DL,and DM; and by closeness centrality, Male, DM, Married, Stroke, and CA. Network analysis identified top clinical laboratory parameters as GLU, HBALC, TC, UA, HCT, TG, HGB, WBC, and CYC, consistent with statistically significant parameters (P < 0.05) in linear regression validation. Among machine learning algorithms, Ridge regression performed best in AUC, PR‑AUC, Brier Score, and Log Loss. The digital marker generated by Ridge regression yielded average scores of 0.016(0.008), 0.058(0.019), 0.296(0.197) for Low, Middle, and High groups, respectively. This paper developed a digital marker by integrating network analysis and machine learning to delineate CMM's cross‑sectional subgroups, indicating its potential for early detection and enabling future research on stratified interventions for high risk groups.
PMID:42391238 | DOI:10.1371/journal.pdig.0001528

