PLoS One. 2026 Jan 9;21(1):e0340287. doi: 10.1371/journal.pone.0340287. eCollection 2026.
ABSTRACT
BACKGROUND: Near real-time electronic health record (EHR) data offers significant potential for secondary use in research, operations, and clinical care, yet challenges remain in ensuring data quality and stability. While prior studies have assessed retrospective EHR datasets, few have systematically examined the integrity of real-time data for research readiness.
METHODS: We developed an automated benchmarking pipeline to evaluate the stability and completeness of real-time EHR data from the Yale New Haven Health clinical data warehouse, transformed into the OMOP common data model. Twenty-nine weekly snapshots of the EHR collected from July to November 2024 and twenty-two daily snapshots collected from April to May 2025 were analyzed. Benchmarks focused on (1) clinical actions such as patient additions, deletions, and merges; (2) changes in demographic variables (date of birth, gender, race, ethnicity); and (3) stability of discharge information (time and status). A synthetic dataset derived from MIMIC-III was used to validate the benchmarking code prior to large-scale analyses.
RESULTS: Benchmarking revealed frequent updates due to clinical actions and demographic corrections across consecutive snapshots. Demographic changes were most frequently related to race and ethnicity, highlighting potential workflow and data entry inconsistencies. Discharge time and status values demonstrated instability for several days post-encounter, typically reaching a stable state within 4-7 days. These findings indicate that while near real-time EHR data provide valuable insights, the timing of data stabilization is critical for accurate secondary use.
CONCLUSIONS: This study demonstrates the feasibility of automated benchmarking to assess the integrity of real-time EHR data and identify when such data become analysis ready. Our findings highlight key challenges for secondary use of dynamic clinical data and provide an automated framework that can be applied across health systems to support high-quality research, surveillance, and clinical trial readiness.
PMID:41511976 | DOI:10.1371/journal.pone.0340287

