EpiSmokEr2: a robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data

Scritto il 17/02/2026
da Tianyu Zhu

Epigenomics. 2026 Feb 17:1-11. doi: 10.1080/17501911.2026.2630841. Online ahead of print.

ABSTRACT

AIM: Tobacco smoking induces persistent DNA methylation (DNAm) changes in blood that can serve as long-term biomarkers for smoking exposure. We aimed to develop and validate a DNAm classifier of smoking status using Illumina EPIC array data.

METHODS: We built Epigenetic Smoking status Estimator2 (EpiSmokEr2), a Least Absolute Shrinkage and Selection Operator (LASSO) regression-based DNAm classifier using 511 CpGs from Illumina Infinium MethylationEPIC array (EPIC) data. The model was trained on 1343 samples from the Young Finns Study cohort and validated across six independent datasets from four cohorts and two array platforms (EPIC and EPICv2).

RESULTS: EpiSmokEr2 achieved an average sensitivity of 0.87 and specificity of 0.86 in distinguishing current from never smokers. Predicted smoking status correlated strongly with established DNAm smoking scores and GrimAge, indicating its ability to capture biologically relevant smoking effects. Simulation analysis showed EpiSmokEr2 was robust for up to 10% missing CpGs.

CONCLUSION: EpiSmokEr2 provides a reliable DNAm-based estimator of smoking status. It is available as an open-source R package on GitHub, facilitating broad use in epidemiological and clinical research.

PMID:41700335 | DOI:10.1080/17501911.2026.2630841