Cureus. 2026 May 16;18(5):e108973. doi: 10.7759/cureus.108973. eCollection 2026 May.
ABSTRACT
BACKGROUND: Publicly reported celebrity deaths receive substantial media attention and may influence how individuals perceive disease burden. However, datasets constructed from publicly available sources are often curated and may not reflect underlying epidemiological distributions. Understanding how dataset construction influences observed mortality patterns is therefore important.
OBJECTIVE: To present a purposively curated and openly documented dataset of publicly reported celebrity deaths as a worked methodological illustration of how dataset construction, selection, and reporting practices may shape apparent mortality distributions in media-visible data.
MATERIALS AND METHODS: A purposively curated dataset of 164 celebrity deaths was assembled from publicly available English-language sources, principally Wikipedia entries and their referenced material. Cases were selected to include variation in age, reported sex, and cause of death to enable descriptive comparison. Causes of death were classified as cardiovascular or non-cardiovascular using rules defined a priori, and cardiovascular disease (CVD) cases were further subdivided into six International Classification of Diseases (ICD)-aligned subcategories. To assess classification reproducibility, a blinded second coder independently re-classified a random sample of 30 cases. The dataset, the case-level ICD mapping dictionary, and a reproducible analysis pipeline are openly available.
RESULTS: Cardiovascular causes accounted for 84 of 164 deaths in the curated dataset (51.2%); non-cardiovascular causes accounted for 80 deaths (48.8%). Within the CVD group, the largest subcategory was heart failure and other unspecified CVD (33 cases; 39.3%), followed by ischemic heart disease (22 cases; 26.2%), cerebrovascular disease (nine cases; 10.7%), arrhythmia and sudden cardiac death (seven cases; 8.3%), aortic and pulmonary vascular disease (seven cases; 8.3%), and cardiomyopathy and structural heart disease (six cases; 7.1%). The two coders agreed on the binary CVD/non-CVD classification for all 30 cases in the reliability sample (Cohen's κ = 1.00) and on the six-category subcategory classification for all 18 cases that both coders classified as CVD (Cohen's κ = 1.00). Cardiomyopathy and structural cases occurred at a notably younger mean age (35.5 years) than other CVD subcategories. CVD deaths overall occurred at older ages than non-CVD deaths and were predominantly observed among males.
CONCLUSIONS: The patterns observed in this curated dataset reflect the combined effects of investigator selection, demographic non-comparability between celebrities and the general population, and reporting practices in publicly available sources. The high proportion of non-specific cause-of-death descriptions illustrates an important component of reporting bias in media-visible mortality data.
PMID:42306359 | PMC:PMC13267849 | DOI:10.7759/cureus.108973