Skip to main content

Jason Thomas

Graduated: August 21, 2021

Thesis/Dissertation Title:

Assessing the fitness for use of real and synthetic electronic health record data for observational research

Over the past decade, electronic health record (EHR) adoption has led to an explosion in the volume of Electronic health record and log data, then efforts to effectively harness the potential of these data for knowledge discovery (KD) and quality improvement (QI). In parallel, recent gains in artificial intelligence have produced powerful methods to analyze, use, and even create synthetic data. However, limitations in data utility (e.g. bias, data quality, comprehensiveness) and accessibility (e.g. privacy, interoperability, availability), as well as limited means to measure and manage tradeoffs between the two are significant barriers to using these data effectively. Determining whether data are suitable to be used in a specific analysis or context, known as “fitness for use” is not included in current frameworks for general health record data quality characterization nor evaluated by data quality assessment (DQA) tools. EHR log data use is particularly unrefined for QI and KD due to an absence of validated standards and methods. Thus, users of electronic health record and log data remain uninformed as to the fitness for use of their data at baseline and are unable to effectively assess subsequent tradeoffs between utility and privacy when applying preserving technologies.

First, we 1) developed a framework for data utility assessment of electronic health records, then 2) adapted open-source tools to make use of this framework which we then applied to assess the utility of real and synthetic EHR data for observational research related to COVID-19 and/or future influenza pandemics. Second, we evaluated whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses. To do so we conducted replication of studies and computed general summary statistics on original and synthetic data, then compared the similarity of results between the two datasets. Third, we conducted a retrospective, observational analysis - with and without privacy preserving technology - of clinical workstation authentication behaviors from the UW Medicine health system to inform customized solutions that balance usability and security.


Drs. Adam Wilcox (Chair), Gang Luo, Matthew Thomas Trunnell, Larry Kessler