Development and cross-site validation of machine-learning models for diagnosis and prognosis of stable angina with and without obstructive coronary artery disease: a study protocol

Scritto il 08/04/2026

da Jiawen Deng

BMJ Open. 2026 Apr 8;16(4):e108799. doi: 10.1136/bmjopen-2025-108799.

ABSTRACT

INTRODUCTION: Angina with no obstructive coronary artery disease (ANOCA) affects millions and is frequently under-recognised because diagnostic pathways and risk tools predominantly target obstructive coronary artery disease (CAD). This protocol describes shared methods for two machine-learning (ML) studies: (1) differentiating ANOCA from stable angina with obstructive CAD and (2) predicting long-term mortality among patients with ANOCA and obstructive CAD.

METHODS AND ANALYSIS: We will develop and cross-site validate ML classification models using a multicentre retrospective cohort drawn from the Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease registry and institutional datasets from the University of Ottawa Heart Institute and the University Health Network. Eligible participants are adults (≥18 years) undergoing initial cardiac catheterisation for chest pain/anginal equivalents since 1995, excluding prior revascularisation, major structural heart disease and predefined non-anginal indications. Outcomes are (1) ANOCA (0% to <50% stenosis) versus obstructive CAD (≥50% stenosis) and (2) 1, 3 and 5-year mortality, modelled separately for ANOCA and obstructive CAD.Model development will use nested cross-validation with stratified k-fold inner-loop tuning and leave-one-site-out cross-validation for repeated external validation. Candidate predictors will be harmonised across sites, filtered for missingness and refined using expert/directed acyclic graph-guided selection plus Boruta and Least Absolute Shrinkage and Selection Operator. Preprocessing includes appropriate encoding, missing-data imputation (multivariate imputation by chained equations) and feature scaling. Algorithms will include elastic-net logistic regression, random forest, LightGBM and multilayer perceptron models; hyperparameters will be optimised via Bayesian optimisation. Performance and threshold tuning will be reported. Explainability and subgroup fairness will be assessed using SHapley Additive exPlanations. Final models will be deployed as a web-based clinical risk calculator.

ETHICS AND DISSEMINATION: Ethics approval has been obtained from the University of Calgary and the University Health Network (#24-5916). Analyses will use deidentified data in secure environments; only aggregate results will be reported. Findings will be disseminated via peer-reviewed publications, conferences and a web-based calculator.

PMID:41951258 | DOI:10.1136/bmjopen-2025-108799