Stat Med. 2026 Mar;45(6-7):e70489. doi: 10.1002/sim.70489.
ABSTRACT
Missing covariates are a common challenge when applying an existing logistic regression model to new or external datasets, particularly in the context of model updating. While regression calibration and model updating methods have been developed to address such partial data availability, each has limitations in terms of bias, variance, and sensitivity to model misspecification. In this study, we propose a surrogate-calibrated updating (SCU) method that integrates calibration and updating approaches to improve the efficiency and reliability of coefficient estimation in the presence of missing covariates. The SCU method leverages surrogate covariates-variables that are routinely available across old and new datasets and correlate with the missing covariates-and applies a weighted averaging scheme that combines information from both fully observed and partially observed data sources. This approach mitigates bias while reducing variance, offering a practical and robust alternative to existing methods in population updating setting. We provide a theoretical justification and derive the corresponding estimators and variances. Simulation studies demonstrate the method's favorable performance under various scenarios, including the case with model misspecification. The SCU method is further illustrated using data from the Framingham Heart Study, where diabetes history serves as a surrogate for partially observed glucose levels in assessing cardiovascular disease risk. JEL Classification: C13, C18, C35.
PMID:41830128 | DOI:10.1002/sim.70489

