MS Students | PhD Students | Postdoctoral Trainees | Clinical Informatics Fellows
The Use of Natural Language Processing and Machine Learning for the Early Diagnosis of Lung and Ovarian Cancer
Cancer is a serious diagnosis and diagnostic delay is correlated with reductions in survival rates following treatment. For many cancers, providers can only rely on symptoms and signs to diagnose patients. These details are recorded primarily free text clinical notes. Natural language processing (NLP) can be used to extract symptoms/signs from these notes for population level diagnosis screening. This creates opportunity for machine learning to alert providers earlier in the diagnostic process using existing, but easily overlooked information.
Thus, the focus of this thesis was to determine opportunities for reducing diagnostic delay in ovarian and lung cancer. A symptom extraction model trained on a primarily COVID-19 population was adapted to lung and ovarian cancer populations. The model then extracted symptoms/signs from a retrospective case-control study (ovarian) developed as part of this work as a well a leveraged study (lung). Symptom frequencies for ovarian cancer were then explored across different routes to diagnosis. Finally, this thesis developed experiments using machine learning models to predict lung and ovarian cancer prior to diagnosis. This work showed early prediction using symptoms was only possible on the lung cohort. Nevertheless, both cohorts had significantly higher “next step” recommendations in cases as compared to controls, even 6 months prior to diagnosis.
Committee: Meliha Yetisgen (chair), Matthew Thompson
Maybe They Had a Bad Day: How Marginalized Patients React To Bias in Healthcare and Struggle to Speak Out
Marginalized people, including Black, Indigenous, People of Color (BIPOC) and Lesbian, Gay, Bisexual, and Transgender and gender diverse people (LGBTQ+) are subject to implicit bias during healthcare interactions which negatively impacts patient-provider communication and the quality of care. In this qualitative study, we collected 25 personal stories of unfair treatment from patients from marginalized populations. We report on the patients’ reactions to implicit bias that affected their health care experience.
Committee: Drs. Wanda Pratt-Chair, Andrea Hartzler
Everyone’s Variant Annotation Tools: EVAT
Currently, there is a lot of genetic variant information distributed in many different databases, and it will cost individuals plenty of time to retrieve data from those resources.
In this thesis, I develop EVAT(Everyone’s variant annotation tool), a tool aiming at helping individuals retrieve annotation information about their genetic variants. People with or without programming skills may choose different methods to get their genetic variant annotation information. For individuals who have program skills, EVAT offers Python APIs which connect to the backend directly to help them retrieve annotation information. The backend of the tool is built by four file interpreters that translate file format, a module that sends and receives information from MyVariant.info, a module that converts the JSON result from MyVariatn.info to Panda dataframe and three functions that support different queries. For individuals who don’t have program skills, EVAT offers a graphical user interface, which is the front end of the tool. This user interface allows users to upload files, read the annotation, and do the query by mouse so that they don’t need coding skills when doing genetic annotation.
EVAT can be used either as a backend only (for users with programming skills) or with a graphical user interface to easily query and retrieve annotation information. This annotation information could help people understand the effect of genetic variants or do further research about them.
Committee: Wanda Pratt, Ari Pollack
Greg Aeschliman, MD
A Wireframe Representation of a Prototype Clinical Decision Support Tool For the Management of Cardiometabolic Disorder and Diabetes Type 2
Diabetes Mellitus Type 2 is a complex disorder with complex management pathways. This research developed a wireframe representation of a prototype Clinical Decision Support Tool that enables the comprehensive, efficient and efficacious management of patients with Cardiometabolic Disorder and Diabetes Mellitus Type 2.
(The Tool’s development employed user-centered, iterative design principles to create both the user interface and the backend decision support logic. The design process took place in the context of a cross-functional team of physicians, pharmacists, diabetic nurse educators, care managers and administrators.)
This research developed a wireframe representation of a prototype Clinical Decision Support Tool that enables the comprehensive, efficient and efficacious management of patients with Cardiometabolic Disorder and Diabetes Mellitus Type 2. This research took place in the Eastside Health Network, an Accountable Care Organization and employed user-centered, iterative design principles to create both the user interface and the backend decision support logic. The design process took place in the context of a cross-functional team of physicians, pharmacists, diabetic nurse educators, care managers and administrators.
Last Known Position: MD, Primary Care Physician, Evergreen Healthcare
Committee: Peter Tarczy-Hornoch, Michael Leu
An Evaluation of the Insidious Consequences of Clinical Computing Infrastructure Failures at a Large Academic Medical Center
Committee: Mark Whipple, Thomas Payne
The Use of Inter-Provider Variation in Measuring Healthcare Performance
To monitor and improve healthcare in the US, providers are required to report healthcare measures as part of regulatory and compensatory systems. However, there are growing concerns that the collection and reporting of these measures may be counter-productive to provider efforts to improve care. Variations in care are known to adversely affect quality, but studies on the relationship of variation within measures and performance of those measures are lacking. We aimed to test if inter-provider variation of a healthcare measure was associated with performance of that measure and thereby establish a model for identifying measures that might be more likely associated with opportunities to improve care. We found that between 14% and 23% of the performance of our chosen measure was associated with variation between providers. This finding suggests that inter-provider variance of a measure can be used to help identify measures where opportunities for improvement of clinical processes exist.
Last Known Position: UW BHI PhD Student; Assistant Director, National Network of Libraries of Medicine Evaluation Office, University of Washington
Committee: Drs. Adam Wilcox (Chair), Anne Turner
Krystal V. Slattery
Assessing the feasibility of predictive modeling for HFE-Hereditary Hemochromatosis diagnosis using electronic health records
Secondary use of electronic health records allows researchers the opportunity to test hypotheses and gain new insights on complex disease phenotypes. Hereditary hemochromatosis is an inherited autosomal recessive disorder that causes an excessive absorption of iron. Early diagnosis and disease management are critical as iron accumulation in tissue leads to organ failure and eventually death. Diagnosis of hereditary hemochromatosis requires evidence of iron overload and a positive genetic test result. At the University of Washington there is no standard clinical guidelines for hemochromatosis genetic testing and only ~ 7.5% of patients tested have a confirmed diagnosis. We aimed to identify potential variables for additional screening criteria and inform clinical guidelines for hemochromatosis genetic testing. We found that using established recommendations for genetic testing of hemochromatosis from the American Association for the Study of Liver Diseases (AASLD) and the European Association for Study of the Liver (EASL) on patients screened by their physician for testing would have reduced the number of tested patients from 873 to 345 and maintained 92% of positive diagnoses. Additionally 3010 patients in the UW Medicine system meet the minimum testing requirements for genetic testing established but have not been tested. Using five rules developed from our gold standard test set, we have identified a subset of 265 patients who may benefit from genetic testing for hemochromatosis. These finding suggest that additional clinical guidelines could be developed to both reduce unnecessary testing and identify additional candidates for testing.
Committee: Drs. Peter Tarczy-Hornoch (Chair), Gang Luo, Deborah Nickerson
U-Net for Cerebral Cortical MR Image segmentation
Cerebral cortex segmentation from three-dimensional structural Magnetic Resonance (MR) brain images plays an important role in measuring loss of cortical tissues for disorders such as Alzheimer's disease (AD). U-Net, a type of deep convolutional neural networks architecture, is a widely-used approach for biomedical image segmentation in recent years.
In this thesis, I implemented 2D/3D U-Nets on MR images from 20 patients with labeled cerebral tissues and regions. A two-stage pipeline was designed for this task. In stage one, U-Net aims to generate a mask of grey matter to filter out other tissues in brain MRI images. In stage two, a similar U-Net architecture is used to label cerebral cortex sub-regions from images which only contains grey matter.
Both 2D U-Net and 3D U-Net do not work for labeling gyri/sulci, and only achieve approximate $55%$ Dice overlap for labeling cortex regions. In contrast, the cerebral cortex segmentation package in FreeSurfer achieves over $90%$ Dice overlap for labeling gyri/sulci by using a graphical-based probabilistic estimation method with prior information.
I believe that the main reason of bad performance of 2D/3D U-Net is the loss of global position information of pixels/voxels by cutting original MR images into small parts. The U-net architecture has weakness of handling high resolution 3D images with imbalanced number of classes. In the feature work, researchers could create hybrid methods to combine deep neural networks architectures with prior information to label cerebral cortical sub-regions.
Last Known Position: Deep learning and computer vision engineer at YITUTech
Committee: Drs. John Gennari (Chair), Linda Shapiro
Kevin Sha Li
Understanding the practical utility of using the analytic potential of patient data in Identifying the High Cost patients
It is widely known that the minority of patients make up the majority of healthcare costs. Research being done aims at predicting/identifying these patients through predictive modeling. All in the hopes that an increase of targeted resources can prevent the inurnment of the high cost, which can help the patient and hospital. Yet what is the actual utility in these models? Most models only apply to the particular institution the model was conceived in or are hindered by limitations of size. In this study, I went through patient’s clinical notes to better understand how practical such predictive models are. First, I sought after literature to better understand what variables most predictive models use as their base. From there, I compare it to what was available in the patient’s profile. I revised what was necessary to predict high cost given what was accessible in the database. With access to UWMC/Harborview database, I went through clinical notes to evaluate each patient’s possible predictability. These determinations were later verified by a physician for accuracy. This was further reflected at Northwest (NW) Hospital, which is a relatively smaller hospital with a focus on inpatient/outpatient. NW Hospital provides a contrasting site for comparison. Afterward, each patient was categorized in what was the nature of their high cost. This work's importance is in how to consider predictive models moving forward. Assuming modeling will always have the solution to predict high-cost patients is misguided. Instead, understanding the underlying dynamic of the patient's cause is a better target. The conclusions made in this study can help better structure models to be more effective in how they predict patients.
Last Known Position: Data Research Scientist, UWMC
Committee: Drs. Adam Wilcox (Chair), Thomas Payne
Evaluating Different Approaches to Simplifying Data Access for Clinical Users
Researchers have difficulty in accessing health care data for multiple different reasons. Although some technologies like i2b2 have been developed and evaluated to overcome these difficulties, limitations and challenges remain. In addition, there are limited comparisons among query tools; such that users do not have an understanding of which tool works best in which situation. Studies that evaluate and compare such technologies to both guide users and improve tools are needed.
To evaluate and compare between two self-service query tools – LEAF and I2b2, and one common data model – OMOP, I selected different representative query questions that are commonly asked by researchers based on externally-defined query categories; quality measurement, based on observational EHR research studies, and ask the representative queries made by users to the analytics team in our organization. Most of the query questions included four main concepts: the diagnosis, patient age, length of stay, and measurement period. I used the three different query tools to answer all query questions. I then analyzed the results to determine which is the best approach to increase data access using the two main determinants in Technology Acceptance Model (TAM): perceived usefulness (PU) and perceived ease of use(PEOU). LEAF, developed by the University of Washington, returned as the best performer among the three query tools due to its flexibility, perceived ease of use, and perceived usefulness. Researchers can easily explore its customized features without needing a programming background. The development of these technologies would reduce the challenges for data access in health care.
Last Known Position: Content Analyst, Microsoft
Committee: Drs. Adam Wilcox (Chair), John Gennari
A Mixed-Methods Approach to Exploring Engagement in MoodTech: An Online CBT Intervention for Older Adults with Depression
In recent years, online cognitive behavioral therapy (CBT) interventions have played an increasing role in treating late-life depression. Previous studies have reported that online CBT interventions can be effective in treating depression and promoting behavioral changes among older participants, but inadequate engagement will potentially weaken the effectiveness. Most previous investigations of engagement in online CBT have collected data from self-reported symptom scales or questionnaires and used statistical approaches to establish associations between engagement and various predictors like demographics, personality, disease symptoms or intervention design factors. In this study, we considered engagement as an important aspect of user experiences and employed multiple methods on log data and qualitative data to understand engagement of individual user and in group environments. We analyzed data from MoodTech, a pilot study of an online CBT intervention for older adults with depression, characterizing participants’ engagement and exploring motivations and barriers that may cause differences in observed patterns. There were three aims. First, we identified patterns of engagement through visual analysis of log data. Second, we conducted a network analysis of the participants who had access to the social interaction features and compared the three kinds of peer interactions (comments, likes and nudges) as representatives of group engagement. Third, we performed a qualitative analysis of the textual data, including messages, posts, comments and thought records from the participants, to identify the application of CBT principles and explore how participants engaged with the intervention. From the learning and practice experiences of the older participants, we identified several themes that affected the engagement and attitudes toward the intervention. Future intervention designs may take these findings into consideration and adjust lesson contents for different patterns of participants, as well as improve the usability to meet the needs of older adults.
Committee: Drs. Annie Chen (Chair), John Gennari, Kathryn Tomasino
Identification of Physicians and EHR User Types
In recent years, the adoption and meaningful use of Electronic Health Records (EHRs) have undergone several promotions by governmental acts and incentive programs, with the promise of plenty of benefits. One of the most important objectives of EHR implementation was to make clinical providers to work more efficiently. However, previous studies have shown widespread dissatisfaction and low work efficiency among physicians due to rapid change of technologies and increasing requirements of EHR use. Based on a well-established model of EHR capabilities and physician EHR user, this research applied two strategies to identify different EHR user types among a large population of physicians from EpicCare Ambulatory Provider Efficiency Profile (PEP) dataset. The trend of physician work efficiency among three EHR user types identified – basic users, strivers, and arrivers – was consistent with our hypothesis that basic users have the highest work efficiency; strivers have the lowest efficiency; and arrivers have medium work efficiency but probably gain other benefits from EHR use. This identification work of physicians and EHR user types is useful for facilitating healthcare providers to move to more efficient EHR use stages¬ and deliver better care for patients in the future.
Last Known Position: Data Analyst at Acumen, LLC
Committee: Drs. Adam Wilcox (Chair), Gang Luo
Understanding Context of Use and Perceptions of Usability of Cosegregation Analysis Tool AnalyzeMyVariant
Calculating the genetic risk for a disease of allelic variants of unknown significance can be a complicated task. AnalyzeMyVariant is a tool designed for genetics experts that uses pedigree data from families with genetic variants of unknown significance, to calculate likelihood ratios that a variant fits pathogenic or benign patterns. In this study, we performed a two-part evaluation of the tool to understand the context within which genetics experts might use this tool, and assess their initial usability perceptions. First, we surveyed existing literature to develop an instrument to assess perceptions of usability based on constructs of usability, quality, and safety. The instrument consisted of scaled as well as open-ended questions assessing users’ perceptions relating to each of the constructs of interest, with regard to their experience with AnalyzeMyVariant. We used the instrument to collect qualitative and quantitative Likert-type data from 57 genetic experts who were recruited via email invitations. The second part of our evaluation was comprised of follow-up, semi-structured interviews with 6 genetics experts to identify work contexts in which users might use the tool, and further delve into issues faced in using the tool. These interviews were inductively coded and major themes identified using the constant comparative method. Based on these findings, we provide recommendations for future improvement of the tool. This work has importance in the consideration of the varying needs of genetics professionals and how they use cosegregation analysis in their work, and the difference between requirements for research-focused and clinically-focused work. The results could also inform the future development of other tools developed for experts, particularly with regard to the attention that must be paid to experts’ context of use, background knowledge, and the intended applicability of results.
Last Known Position: User Researcher at Toolbox Medical Innovations
Committee: Drs. Annie Chen (Chair), Brian Shirts
mPower Voice Activity Monitoring and Classification for Parkinson’s Diagnosis
Background: Parkinson’s disease patients’ voice data collected via the mPower application can be classified into three groups: immediately after taking medication (at their best condition), immediately before taking medication (at their worst condition), and somewhere in between medication doses (neither best nor worst condition).
Objectives: Our goal for this investigation is to validate voice as an accurate classifier of medication status in patients with Parkinson’s Disease.
Methods: After data pre-processing, logistic regression, support vector machines (SVM), decision trees, Gaussian Naïve Bayes and Multi-layer Perceptron (MLP) is applied for model training.
Results: The accuracy is relatively low as 0.51 on average for just best and worst condition and it increases to 0.76 for SVM if the condition between best and worst is also included. If we just consider the data for single patient, the performance of the model can increase to 0.81.
Conclusions: The result shows that there is connection between voice and Parkinson’s Disease conditions. However, the difference between the condition might be larger than the difference between each individual.
Last Known Position: Applied NLP Scientist, Sciome LLC
Committee: Drs. Meliha Yetisgen (Chair), Adam Wilcox
Ming-Tse Tsai, MD
Predicting Medical Patients’ Length of Stay in Emergency Department (ED) at presentation
ED overcrowding is a significant issue in modern medicine across many countries, which not only threatens patient safety but also burns out providers' passion. Among several proposed indicators, ED LOS is the most common one of measurement. Being able to predict ED LOS at a patient’s presentation provides valuable information to all the stakeholders in ED, including patients, providers, and managers. In this study, a predictive model was built, as well as the powerful predictors were identified, via a machine learning method by leveraging the real-world data collected in a medical center in Taiwan. The results benefit in informing future modeling and shed a light to the path towards tackling this complex multifactorial phenomenon.
Last Known Position: VP of Medical Information Officer at Kura Care
Committee: Drs. Thomas Payne (Chair), Neil Abernethy, Steven H. Mitchell
A Systems Biology Approach to Characterizing Gene Fusion Pathways in Cancer
Gene fusions have long been known to drive cancer. Initial discovery of gene fusions was opportunistic, and functional assessment was done individually and experimentally. There is no comprehensive systems biology approach to understanding the impact of gene fusions on the signaling networks within tumor cells. An integrative computational approach was taken to achieve a better understanding of gene fusions and their complex influence on pathways and interaction networks in the context of lung cancer. Using well-studied fusions and publicly available gene expression data, the effect of fusion events on the expression pattern of gene networks revealed unique differences in tumors with gene fusions, tumors without gene fusions, and normal samples. This approach identifies gene expression signatures associated with specific fusions, and provides a model for integrating experimental and pathway data to better understand the biology of a fusion genes and their roles in oncogenesis.
Last Known Position: Bioinformatician at NanoString Technologies, Inc.
Committee: Drs. Neil Abernethy (Chair), Ali Shojaie, Erin Piazza
Qualitative Assessment of Hot Debriefs for Code Teams at Seattle Children’s Hospital
Seattle Children’s Hospital recently implemented ‘hot debriefs’ for code teams that respond to cardiac or respiratory resuscitation code events. Hot debriefs are meetings immediately after the code event where the code team members are able to discuss the details of the event that just transpired. These discussions generally revolve around aspects of the code event that went well as well as those that could be improved upon. Before the implementation of these hot debriefs, no such formal meetings with the entire code team were required. This meant that if any particular code team member did want to discuss a code event, participation was minimal and the meeting would often occur at a much later time such as the following day. Hot debriefs were implemented with the intent of increasing information review and improving the quality of future code events. I assessed the status of these hot debriefs using well-established qualitative research methods and semi-structured interviews with clinicians who participated in them to understand their thoughts and feelings on the new process. I interviewed ten participants (including nurses, respiratory therapists, physicians, etc.) and qualitatively analyzed their responses. Four key themes emerged: the effectiveness of hot debriefs, process formalization, openness of communication, and dissemination of information. For the first theme, the participants unanimously approved of the hot debriefs as a process for increasing information review and improving the quality of code events. However, there were concerns revolving around the other three themes with mixed opinions. This study shows that in order to effectively implement a process such as hot debriefing, one should consider the needs and opinions of the participants themselves.
Last Known Position: User Operations Specialist at Stripe
Committee: Drs. John Gennari (Chair), Joan Roberts
Predicting Cancer Outcome with Multispectral Tumor Tissue Images
Tumor tissue slides have been used by clinicians to assess cancer patient’s condition and indicate prognosis. Several studies have suggested that the distribution of important immunological biomarkers on tumor tissue slides might help predict survival outcome   . These studies rely upon non-parametric Kaplan-Meier survival analysis with Log-rank test to extract statistical insights, which, however, has several disadvantages such as prediction ambiguity and inability to directly model continuous variables.
In this study, we engineered 676 features encoding cellular distribution information from multi-spectral tumor tissue images from 118 HPV-negative oral squamous cell cancer patients. We leveraged statistical methods and predictive models to explore the predictive power of these features. 18 features were identified as potential survival predictors through Kolmogorov-Smirnov test. Our best model, random forest model, has achieved 58.54% prediction accuracy rate on independent validation dataset. Although the model does not suggest strong predictive power of selected features, evaluation on large scale training data is still needed to further tune model parameters and generate more concrete results.
Last Known Position: Full Stack Data Engineer, Salesforce AI
Committee: Drs. Peter Myler (Chair), Ilya Shmulevich
A Bayesian Network Model of Head and Neck Squamous Cell Carcinoma Incorporating Gene Expression Profiles
Radiation therapy is a treatment for metastatic Head and Neck Squamous Cell Carcinoma, which allows precision targeting of certain groups of lymph nodes. A Bayesian network predictive model was developed aiming to help achieve such precision using information on the primary site and size of the tumor, representing the current decision-making process in clinical settings. The patient’s genetic profile was added to examine its predictability of metastasis through the improvement in prediction accuracies. The model was trained with publicly available data extracted from the Cancer Genome Atlas (TCGA) and validated against the TCGA dataset as well as clinical data reported to the University of Washington Tumor Board. Results show that genetic profile data improves model accuracy and such improvement may affect clinical decision making especially for patients with more advanced metastasis. A prototype for decision support application was built based on the results to demonstrate the clinical significance of the model. However, more data is needed to show significance of the proposed effects, as well as to improve the accuracy of the overall model.
Last Known Position: Bioinformatics Scientist at Veracyte, Inc.
Committee: Drs. Fredric Wolf (Chair), Mark Phillips, Mark Whipple
Leaf2Tableau: From Real-Time Clinical Data to Clinical Knowledge Discovery
Leaf-to-Tableau, a self-service and real-time clinical data visualization pipeline, is designed and developed to handle data visualization requests for queries developed in Leaf, a clinical data explorer developed by University of Washington Medicine Information Technology Services. It can extract and visualize any Leaf datasets into a portable format that researchers can easily explore without needing a highly technical or statistical background, providing a quick visual summary of the target population. This completes a CDW self-service model with a researcher constructing a query to identify a specific patient cohort in Leaf and subsequently developing custom visualizations for exploration or publication, as well as receiving in return data files for analysis.
Last Known Position: Corporate Management Trainee, Fosun Pharma
Committee: Drs. Sean Mooney (Chair), John Gennari, Adam Wilcox, Mark Wurfel
Michael G. Semanik
Clinical Phenotyping in the Prediction of Acute Kidney Injury
Acute kidney injury (AKI) is an increasingly prevalent problem amongst pediatric inpatients, and is associated with high morbidity and mortality. Unfortunately, current methods of diagnosing AKI rely on “late markers” of injury, making early identification and prevention of AKI difficult. This work describes the development of an “at risk for AKI” clinical phenotype from structured electronic health record data, and its ensuing application in a predictive model. The model performs reasonably well in predicting AKI, with an F1 score of 0.67 and AUC of 0.75. Unstructured data is then added to the model via the inclusion of n-grams derived from ICU clinician notes, which improves performance (the F1 score increases to 0.76 and AUC increases to 0.77). Thus, it is possible to use clinical phenotyping to predict the onset of AKI twenty-four hours before current markers are elevated. This approach may lead to better treatments and preventative strategies for pediatric AKI.
Last Known Position: Assistant Professor, School of Medicine and Public Health, University of Wisconsin
Committee: Meliha Yetisgen (Chair), David R. Crosslin, Sangeeta R. Hingorani
Open-source Computerized Patient-reported Outcomes: Case Studies Illustrating Fifteen Years of Evolution
Over a fifteen year period, Patient Reported Outcomes ("PRO") applications to support over forty clinical and research projects have driven the evolution of an open-source computerized PRO system ("cPRO", http://cprohealth.org). The projects varied widely in PRO content, clinical domain, and workflows. Detailed case studies of six major implementations of the cPRO system offer a framework to understand the socio-technical challenges and opportunities in collecting computerized PROs and incorporating PROs into clinical care, patient-centered tools, and research.
Last Known Position: Technical Program Manager, University of Washington
Committee: William B. Lober (Chair), Donna L. Berry, Heidi Crane
Thesis/Dissertation TitleDownload File
Last Known Position: Software Engineer, Google
Committee: Peter J. Myler (Chair), Neil F Abernethy, William S Noble
Thesis/Dissertation TitleDownload File
Last Known Position: PhD student, Biomedical and Health Informatics, University of Washington
Committee: Wanda Pratt (Chair), Peter Tarczy-Hornoch, Emily E. Devine
Thesis/Dissertation TitleDownload File
Last Known Position: Family Medicine Physician, University of Washington
Committee: Peter Tarczy-Hornoch (Chair), Emily E. Devine, Brian H Shirts
Thesis/Dissertation TitleDownload File
Last Known Position: Senior Consultant at NYSTEC
Committee: Thomas H. Payne (Chair), Meliha Yetisgen-Yildiz, Robert D Harrington
Committee: Harold Goldberg, Diane P. Martin, James D. Ralston, Sebastien Haneuse, Thomas D Koepsell, Jan H. Spyridakis (GSR)
Vital Registration Systems in sub-Saharan Africa: The Redesign of Vital Registration Systems for Health Improvement through Appropriate Information and Communication Technologies
Last Known Position: Technical Informatics Advisor at IntraHealth International
Tsung-Chien (Jonathan) Lu
Cross-Correlation Networks to Identify and Visualize Disease Transmission Patterns
Influenza-like illness (ILI) has been a major threat to the public health around the world. To inform influenza response by enhancing and supporting disease surveillance, a syndromic surveillance system collects case counts that are aggregated from multiple sources and jurisdictions. Although each jurisdiction has their own planned uses of the data, most systems focus on early detection of the outbreak in regional level response and the algorithms they are using often do not point to a route of transmission. In this work, we seek to develop approaches to aid comparison of data among jurisdictions to improve detection of geographic patterns in disease spread. Using cross-correlation to assess the pairwise similarity between regional case counts, we introduce a cross-correlation network based on ILI activity to reveal potential spatio-temporal patterns in disease transmission. The resulting networks were plotted and visualized in the map with the R statistical package. To evaluate the feasibility and utility of this approach, we validate these networks against population-level variables influencing the spread of infectious disease, including flight passenger volume, census worker flow, and geographic distance. In our analysis, the spatio-temporal transmission of ILI correlated more closely with state-to-state census worker flows and distance between states than with flight passenger flows. We demonstrate how this visualization motif might enhance existing tools used for the purpose of syndromic surveillance. Finally, limitations of the approach, broader implications for disease surveillance and informatics, and future directions for this research will be discussed.
Last Known Position: PhD student, Biomedical and Health Informatics, University of Washington
Committee: Anne Turner (Chair), Neil Abernethy
Last Known Position: Assistant Professor, Pediatric Rheumatology, University Of Utah School of Medicine and Primary Children’s Medical Center
Applying a Service-Oriented Architecture Model to TB Diagnosis using MODS Pattern Recognition Algorithm
Last Known Position: Professor, Universidad Peruana Cayetano Heredia
Committee: Linda G. Shapiro (Chair), Sherrilynne Fuller, Ira J. Kalet, William B. Lober,Joann G Elmore (GSR)
Last Known Position: Senior Computing Specialist, UW Medical Genetics and Alzheimer’s Disease Research Center
Evaluation of the Performance of an Electronic Incidence Reporting System: Determining Ease of Use, User Feedback and Root Cause Analysis
Last Known Position: Consultant, Huron Healthcare
Last Known Position: NAV FINSTRAT LLP, Partner
Last Known Position: Informatics Research Associate, Oregon Health & Science University
Last Known Position: Biomedical Informatics Consultant, Institute of Translational Health Sciences (ITHS), University of Washington
Last Known Position: Integration Architect, Cerner Corporation
Last Known Position: Senior Software Engineer
Last Known Position: Sr. Director of Product Management, Ebates
An Array of Choices: Implementing a Storage Solution for Microarray Data at Seattle Biomedical Research Institute
Last Known Position: Research Scientist III - Bioinformatics at University of Washington
Last Known Position: Committee Member at Washington State Health Technology Assessment Program
Syed Zia Ul Huq
Last Known Position: MD, Oncology/Hematology, Mercy Hospital MO
Expression Array Annotation Using the BioMediator Biological Data Integration System and the BioConductor Analytic Platform
Last Known Position: Assistant Professor at Tulane University
A Computer Assisted Coding Tool of the International Classification of Diseases - Version 10 for Mortality Data
Last Known Position: Affiliate Assistant Professor, Global Health, University of Washington; Head of Statistics, Informatics, Data Management and Systems (UIDES) Unit, Partners in Health Peru
Last Known Position: Senior Principal Scientist and Informatics Lead for Protein Homeostasis TCoE at Celgene
The Central Dogma of Molecular Biology: Towards an Ontological Characterization of Cellular Signaling
Last Known Position: Sage BioNetworks
Understanding the differences in cognitively defined subgroups in Alzheimer's disease: A data science approach
Abstract: My work connects two types of data in Alzheimer’s Disease (AD): structural MRI data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) and cognition data in the form of AD subgroups. The subgroups (AD-Executive, AD-Language, AD-Memory and AD-Visuospatial), defined by Crane et al. (2017), are based on cognitive test scores from the time of AD diagnosis, and each subgroup is characterized by marked impairment in the specified cognitive domain relative to the other domains. My dissertation’s focus is on data science and mathematical methods to understand how volumes of 70 brain regions of interest (ROIs) might differ across pairs of AD subgroups in cross-sectional data in time, specifically data from the time of AD diagnosis (Aim 1) and in longitudinal data (Aim 2). My work demonstrates a careful assessment and implementation of methods to best utilize the data available that is currently small in sample size, with imbalanced AD subgroup sizes and noisy in nature.
In Aim 1, I used random forest models for identifying the most important brain ROIs for distinguishing between pairs of AD subgroups. Prior to building classification models, I addressed specific challenges in cross-sectional data: potential noise due to non-ROI variables and imbalanced AD subgroup sizes. A challenge in using classification models in the domain of AD subgroups is that there is no gold standard for knowing how separable the AD subgroups are based on ROI volumes. The work presented here may be the first to establish a benchmark for classification accuracies for distinguishing between pairs of AD subgroups based on ROI volumes, although these models are not intended to be used for prediction in a clinical setting but rather to understand which brain regions are most important to distinguish the AD subgroups. In Aim 2, I used linear mixed effects (LME) modeling on longitudinal data to determine which of the 70 ROIs’ volume trajectories differ the most across pairs of AD subgroups in terms of longitudinal volume and rate of change of volume with respect to time. First, I laid out criteria for using data from specific MRI scans in an effort to reduce noise in data, instead of using the default longitudinal dataset. Given the small sample size of the AD subgroups and irregular data, I implemented LME modeling for each ROI on the original dataset consisting of all time points and also on a series of subsets of data that were obtained by restricting each AD subgroup’s data to time points with a specific minimum number of subjects available. Additionally, in Aim 2 work, I also simulated simplistic synthetic longitudinal data for two hypothetical groups, with tweakable parameters for sample size and group differences, which can serve as a test bed for future analysis methods for understanding AD subgroup differences. An important finding of my work is that there was some overlap in the top ROIs that were determined to be important based on cross-sectional and longitudinal data analyses, for distinguishing between pairs of AD subgroups. Results from my Ph.D. work have potential implications for decisions about which brain regions may be relevant for future neuropathological studies in studying AD subgroups.
Committee: John Gennari (chair), Ellne Wijsman, Paul Crane, David Crosslin, Shuai Huang, Ali Shojaie
Committee: Peter Myler (chair), Christine Disteche, Sreeram Kannan, William Noble, Shawn Sullivan
Automated Assessment of Social Cognition in People with a Schizophrenia Spectrum Disorder
Social cognitive deficits are core features of schizophrenia spectrum disorders (SSD), amongst other conditions. These deficits limit overall functioning, arguably the most important outcome when treating the mentally ill. However, as noted by former National Institute of Mental Health director Thomas Insel, “one cannot treat what they cannot measure” , and these deficits are difficult to measure in consistent and scalable ways. In this work, I leverage neural language representations (word embeddings and a deep neural network) to derive four novel measures of social cognition from transcribed responses to two video cues - one designed to evoke emotions, and the other representing intentions. The resulting measures are evaluated for their ability to distinguish patients with SSD from neurotypical controls, their relationships to validated measures of social cognition and SSD symptomatology, and their ability to detect the effects of an experimental therapeutic agent intended to enhance social cognitive abilities. The resulting automated measures of social cognition can mediate new approaches to the diagnosis, monitoring, and treatment of people with SSD, and other conditions involving social cognitive deficits.
Committee: Trevor Cohen (chair), Donna Berry, Chistopher Althoff, Ellen Bradley, Benjamin Buck
ChungKei Wilson Lau
Improving Design and Usability of Interactive Vulnerability Mapping for Global Health Preparedness
Global health preparedness –the ability of organizations and governments to anticipate risks and respond to disease outbreaks– presents both an imperative and a challenging opportunity for public health informatics interventions. Addressing risks of vector-borne and zoonotic disease (VBZD) outbreaks is especially complex as it involves the careful integration of human, animal, entomological, environmental, and infrastructure data. Presentation and understanding of those risks require usable tools and technology. Spatial Systems for Decision Support (SSDS) are a type of visualization tool that enable public health practitioners to make critical decisions informed by timely access to pertinent, analyzed data. In my dissertation research, I introduce a new type of SSDS, interactive vulnerability mapping tools, which can help decision makers in global health preparedness identify spatial areas that are at risk for VBZD outbreaks and have a lower capacity to contain spread. Decision makers include epidemiologists, public health planners, vector control specialists, and directors, who might use this information to allocate vaccine resources or plan intervention activities to high risk regions. Unfortunately, SDSS tools are not routinely developed using a human centered design (HCD) approach, and there is a lack of deliberate consideration of sociotechnical factors. In my doctoral research, I have applied principles of HCD and information visualization to design and evaluate the usability of interactive mapping tools for dengue vulnerability in Peru (Aim 1) and Rift Valley fever vulnerability in Kenya (Aim 2). To situate my Aim 1-2 findings in the literature, I conducted a scoping review of SDSS for VBZD preparedness (Aim 3) that describes data, users, technology, and use cases in published SDSS studies as well as gaps in the existing literature. This work contributes: 1) usable SDSS tools designed for public health decision makers in Peru and Kenya, 2) empirical data on the design, data visualization preferences, usability, and acceptance of SSDS for disease vulnerability in global health settings, and 3) a reproducible search of the literature on SDSS for VBZD that maps the current state of the literature, characterizes health informatics factors, and identifies opportunities for future research.
Committee: Andrea Hartzler-Chair, Uba Backonja, Nancy Puttkammer, Peter Rabinowitz, Christopher Adolph
Assessing the fitness for use of real and synthetic electronic health record data for observational research
Over the past decade, electronic health record (EHR) adoption has led to an explosion in the volume of Electronic health record and log data, then efforts to effectively harness the potential of these data for knowledge discovery (KD) and quality improvement (QI). In parallel, recent gains in artificial intelligence have produced powerful methods to analyze, use, and even create synthetic data. However, limitations in data utility (e.g. bias, data quality, comprehensiveness) and accessibility (e.g. privacy, interoperability, availability), as well as limited means to measure and manage tradeoffs between the two are significant barriers to using these data effectively. Determining whether data are suitable to be used in a specific analysis or context, known as “fitness for use” is not included in current frameworks for general health record data quality characterization nor evaluated by data quality assessment (DQA) tools. EHR log data use is particularly unrefined for QI and KD due to an absence of validated standards and methods. Thus, users of electronic health record and log data remain uninformed as to the fitness for use of their data at baseline and are unable to effectively assess subsequent tradeoffs between utility and privacy when applying preserving technologies.
First, we 1) developed a framework for data utility assessment of electronic health records, then 2) adapted open-source tools to make use of this framework which we then applied to assess the utility of real and synthetic EHR data for observational research related to COVID-19 and/or future influenza pandemics. Second, we evaluated whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses. To do so we conducted replication of studies and computed general summary statistics on original and synthetic data, then compared the similarity of results between the two datasets. Third, we conducted a retrospective, observational analysis - with and without privacy preserving technology - of clinical workstation authentication behaviors from the UW Medicine health system to inform customized solutions that balance usability and security.
Committee: Drs. Adam Wilcox (Chair), Gang Luo, Matthew Thomas Trunnell, Larry Kessler
Genetic Association to Adverse Drug Events in the eMERGE Pharmacogenomics Cohort
Adverse drug events (ADEs) are a serious problem causing over 100,000 hospitalizations in the U.S. annually. One key component in the response to a drug is our genetic variation. Identifying and using genetic information to avoid ADEs is an already proven method that needs further expansion. The eMERGE PGx project collected electronic medical records (EMR) along with targeted DNA variant data in order to create a useful dataset for pharmacogenetic studies. In this research, an automated approach to identify potential adverse drug events (ADE) in the eMERGE PGx cohort is presented. Data from the EMR is examined through the lens of a database of known adverse drug events: the Drug Evidence Base. Diagnosis codes that were known to be adverse events and appeared in a participant’s medical record following a medication order were labeled as a potential ADE. These potential ADEs were used as phenotypes for genetic associations tests at the single variant, gene, and gene-set level. The results were two findings of two single variants, 10 genes and one gene set having a significant association with one more adverse drug events. These results add to the body of knowledge that continues to grow around variation in drug response.
Last Known Position: Microsoft
Committee: David Crosslin-Chair, John Gennari, Gail Jarvik, Ali Shojaie
The digital transformation of healthcare over the past two decades has led to the proliferation of electronic health record (EHR) databases. These databases present an unprecedented opportunity for biomedical knowledge discovery. Data may be used for several purposes, including epidemiology, operational or clinical quality improvement studies, pragmatic trials and clinical trial recruitment, comparative effectiveness research, predictive modeling, clinical decision support, pharmacovigilance, and genome-wide association studies. In every case, one of the first steps involved is identifying the appropriate cohort of patients matching a set of inclusion and exclusion criteria, using only data available in the EHR. This process, known as EHR-driven phenotyping, is a resource-intensive task that involves many stakeholders, such as clinical experts, informaticists, and database analysts. It is therefore a critical rate-limiting factor that prevents massive scaling of knowledge discovery, and ultimately inhibits our ability to achieve the promise of national imperatives such as the Learning Healthcare System and All of Us. This research will attempt to improve the state of the art of EHR-driven phenotyping in three specific ways. First, we will analyze the variability of a set of existing, clinically validated, phenotype definitions in order to understand the requirements for a formal representation that supports automation. Second, we will assess the suitability of popular and emerging standards for formally representing cohort criteria, and evaluate whether this representation facilitates cross-platform cohort identification. Finally, we will develop and evaluate a fully standards-based system that can be used to create phenotype definitions and execute them against existing EHR data platforms, and evaluate the performance of this system in the context of the extant EHR-driven phenotyping ecosystem.
Last Known Position: Principal Software Engineer, Commure, Inc.
Committee: Drs. Adam Wilcox-Chair, John Gennari, Bill Lober, Brian Shirts
Acute Care Sepsis Prediction: Analyzing the Predictive Influence of Social and Behavioral Determinants
The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 established guidelines to help improve patient safety and efficacy by laying the framework for electronic healthcare record (EHR) adoption in the United States through financial incentives. Through the HITECH Act, basic EHR adoption skyrocketed domestically and large databases of clinical information were created. Currently, many institutions have large quantities of data, that have been under-analyzed, ripe for biomedical exploration and discovery. Within the hospital setting, sepsis is a leading cause of mortality, affecting more than 1.7 million adults annually. It is also present in about 30 to 50 percent of hospitalizations that end with death. Despite the high occurrence and prevalence, detection and diagnosis of sepsis remain a challenge due to its non-specific early-onset symptoms. However, as it can quickly progress to a life-threatening stage, it is important to detect sepsis patients earlier to increase outcomes. With the recently increased adoption of EHRs, many institutions now have large amounts of patient data being collected and have created their own customized sepsis detection and mortality tools using various modeling or machine learning (ML) techniques. Additionally, those who experience more socioeconomic challenges are more susceptible to chronic illnesses, including sepsis. However, structured coding of social or behavioral features is often underutilized and unreliable. First, in order to understand the current environment of predictive analytics solutions for sepsis, we systematically identified various studies that utilize different models or ML techniques and analyzed their approach and results. Second, we developed a framework that utilizes natural language processing text classification from clinical notes to extract social and behavioral determinants of health (SBDH). Third, we assessed classification methods that utilize currently established sepsis definitions or clinical scores to establish a baseline and integrated the SBDH data extracted from clinical notes in Aim Two, and determined if SBDH features can help enhance predictive performance for sepsis detection in the acute care setting.
Committee: Drs. Adam Wilcox-Chair, David Carlbom, Anne Turner, Basia Belza
Ensuring Patient Privacy and Accuracy of Analytical Methods to Support Evidence-Based Healthcare
Over the past two decades, healthcare providers substantially increased their use of electronic health record (EHR) systems. While the early roll outs of these systems have been fraught with complications and the quality of data from these systems is questionable at times, these EHR systems continue to improve. EHRs are primed to become the core of the data driven healthcare system, with the potential to serve as a platform for population health analytics and predictive model development. However, EHRs represent a high risk for exposing patient records and business practices to nefarious actors. Creating infrastructure to deliver predictive methods to clinical records while protecting patient privacy is key to building a reliable healthcare analytics platform. In this dissertation, I focus on three areas with four aims for building a safe and private data analytics platform on the electronic health record. The aims are to: (1) evaluate the University of Washington EHR as a generalizable public health repository, (2) Pilot a Model-to-Data framework as a method to deliver predictive analytic methods to clinical records (3) Scale the Model-to-Data pipeline to host a community challenge, delivering outside models to electronic health records and (4) Develop a patient portal to enable the return of clinically actionable research results.
Last Known Position: Research Scientist, Sage Bionetworks
Committee: Sean Mooney-Chair, Brian Shirts, David Crosslin, Justin Guinney
Supporting Hospitalized Patients through AI Technology
Hospitalized patients of the 21st century are encouraged to actively engage in their care, manage their safety, make medical decisions, and monitor the quality of their treatments. However, engaged hospitalized patients face a dilemma. The complexity of their care makes their engagement more important yet harder to achieve. Patients with complex health problems are cognitively and physically impaired because of pain, stress, and medications. At the same time, the information related to their health situation is more abundant and more complex. Thus, hospitalized patients face an engagement gap that grows deeper with the complexity of their health problems. Artificial intelligence (AI) agents, technologies that automate information processing, could be a promising solution. Yet, we know little about how AI agents could support patients in hospital settings. In this thesis, first, I start by defining technological opportunities, especially AI applications, to support patient and information needs in hospital settings. I propose a new user-centered research method “Muse cards”. The method aims to inspire patients and their family caregiver to disrupt their hospital technologies with new designs that would accommodate their evolving roles in hospital settings. Second, I focus on the patient-clinician conversation, a core source of information in hospital settings. I report the factors that define the importance of verbally communicated information for patients, from the patients’ perspective and from the clinicians’ perspective. Third, I report the results of testing NURI, an AI agent to support hospitalized patients in understanding medical conversations with their clinicians. I report the perception of its usefulness and acceptance form the patients, caregivers, and clinician’s perspective. My work contributes to human computer interaction research a new toolkit to help users disrupt their attachment to existing technologies with new innovative ideas. Moreover, I provide design guidelines to implement AI agents in the hospital settings to support patients and their family caregivers. Furthermore, my work contributes to clinical speech processing research by providing an annotation framework to capture important information for patients’ use from the patient perspective and from the clinicians’ perspective.
Predictive Approaches for Acute Adverse Events in Electronic Health Records
Medical errors have been cited as the third leading cause of death in the United States in 2013. Failure to rescue (FTR) is a subtype of medical errors and refers to the loss of an opportunity to save a patient’s life after the development of one or more preventable and treatable complications. Focusing on detecting early signs of deterioration may therefore provide opportunities to prevent and/or treat an illness in a timely manner, which may in turn reduce the number of FTR cases. When implementing a data-driven model to predict the risk of potential FTR onsets in a supervised setting, gold standard information for the target FTR onset is often not directly retrievable in electronic health records (EHR) so that it requires to manually annotate clinical observations with corresponding labels. This method acts as a bottleneck to scalability and the full utilization of the clinical observations available in EHRs for model training. In this dissertation, I propose a machine learning framework that can be used to derive a risk prediction model using proxy events of the disease of interest, the administration of relevant clinical interventions, as a noisy label via a distant supervision approach. Moreover, this study evaluated the effects of considering the temporal progression of FTR risk estimates calculated using myopic evidence. Lastly, a case study is presented to demonstrate that the proposed prediction models can be deployed to quantify the adverse effects of clinical interventions with regard to the target disease of interest. This dissertation demonstrates 1) the feasibility of using proxy events of the target disease as a label for supervised model training, 2) the performance improvement when temporal progression is considered in the risk prediction model design, and 3) the applicability of the proposed risk prediction model to quantify the adverse effects of clinical interventions regarding the target disease. Suggestions are also provided on how the proposed model could be further improved by integrating experts’ knowledge with the proposed framework.
Enhancing Secondary-use of Electronic Health Records for Geospatial-temporal Population Health Research
For almost three decades, the United States Department of Human and Health Services, Center for Disease Control, and the World Health Organization have recognized the role of social and environmental determinants of health in understanding the health of populations. Community and population health is a function of each individual’s health and wellness, determined in large by socioeconomic status, environmental factors, and access to healthcare services. In disastrous times, spatiotemporally-relevant information escalate in importance as health systems strive to address emergent concerns, pre-existing needs, population migration, while experiencing disruption in available resources and infrastructure. With their adoption by hospitals and health systems, Electronic Health Records (EHRs) contain a richness and diversity of information about patients that could inform where and how to prepare for population-scale patient needs in future disaster scenarios; however, the ability to apply spatiotemporal reasoning with EHRs have remained an underrepresented capacity. Informatics innovations would need to account for the operational, technical, and ethical constraints felt by those who study the health of populations. In this dissertation, I focus on three areas for building capacities to use of geospatial-temporal information to address population health needs. The aims are to: 1) assess information needs and priority use-cases for population health research in hydrologic disaster preparedness, 2) design spatiotemporal use-case workflows to survey trends and anomalies for regional areas using gridded hydrometeorological data products, a surrogate for structured multivariate datasets, and 3) develop an approach for spatiotemporal inferential statistics of EHR patient diagnosis information. This work incorporates flexible design and secondary-use of data for population health research and geographic inferences in preparation for future disasters.
Last Known Position: Research Data Scientist, Biomedical and Health Informatics; Harborview Injury Prevention and Research Center, UW Medicine IT Services
Assessing the utility of Digital Health Technology to Improve Our Capacity to Assess and Intervene in Depression
When it comes to mental health, no country is considered developed. In the last decade, the burden of mental health disorders(MHD) has risen in all countries due to disparities in timely diagnosis and access to evidence-based treatments. Additionally, scientists, are still conducting research to understand the underlying mechanisms behind MHD. Part of the problem is that measures of symptom severity are all based on self-reports by patients and clinician observation often resulting in an imprecise measurement of MHD. Those that are more objective(e.g: MRI) are costly and not widely available, nor are they ecologically valid measures of behavior. Additionally, in-clinic assessments tend to be episodic and often miss capturing the lived experience of disease over time including the potential impact of social and environmental factors that are suspected to be linked to neurodevelopmental and psychological processes. To improve long term outcomes in MHD, there is a critical need to develop new ways to objectively assess specific underlying constructs of behavior patterns linked with neuropsychiatric conditions. The pervasive network of smartphones offers researchers a unique opportunity to study MH at a population scale and at a fraction of the cost of traditional clinical research. The high-frequency daily usage of smartphones also provides new ways to capture the individualized momentary experience of living with mental health issues based on “real-world data”(RWD) in an objective, momentary and nonreactive way.
The principal findings of this dissertation research show the feasibility of utilizing smartphones to reach, enroll and engage a diverse and nationally representative population as well as the potential of using RWD in predicting mental health outcomes. The RWD collected from more than 2000 participants showed notable inter-/intra-person heterogeneity highlighting the challenges of developing a robust cohort level machine learning model to predict depression. However, personalized N-of-1 models show the promise of “precision digital psychiatry” by assessing an individual’s drifts from their own average “digital behavior” as a more reliable predictor of a person’s daily mood. Of note, participant enrollment and retention in large-scale digital health research studies remains a significant challenge. Cross study analysis using data from >100,000 participants showed significant underlying biases in technology access and utilization based on participants’ demographics that could impact the generalizability of the statistical inference drawn. In addition, the results from a survey-based study on a large and diverse sample show growing concerns among the general public about the security and privacy of their digital data which if left unaddressed can negatively influence people’s decision to participate and share data in digital health research.
These findings are contemporary and extend the on-going efforts to objectively evaluate the potential fit of technology in psychiatry in engaging the general population to monitor their mental health in the real world outside the clinic. However, while the technology shows the promise to move the psychiatric research from subjective to objective measures, episodic to continuous monitoring, provider-based to ubiquitous and reactive to proactive care; accomplishing these goals does come with measurable challenges. Further research is needed to develop robust and validated digital biomarkers of behavioral health. This includes large scale behavioral phenotyping studies (N > 100,000) that are powered to detect the association between RWD and behavioral anomalies, the ability to integrate RWD across similar studies, improve equitable utilization of technology across a diverse and representative population and address people’s concerns about data security and privacy.
Last Known Position: Assistant Professor, Dept of Psychiatry, Univ of Toronto; Group Head, Digital Health & AI
Developing and Evaluating a Prototype Communicable Disease Web-based Clinical Reporting Tool
Reporting reportable diseases within a time-frame is considered a cornerstone of any public health surveillance system. The purpose of surveillance is to empower decision makers to act by providing timely and accurate data. Conducting surveillance requires a cycle of collecting and reporting individual cases by solo healthcare providers or healthcare facilities to the local/public health department. Healthcare providers are familiar with the requirements to report reportable diseases, but compliance is a challenge.
Novel influenza has been a reportable disease since the 2007 legislation. Pandemic influenza is caused by novel influenza that is introduced into a population where some of this population has low immunity to the novel influenza, which increases the mortality rate. In the past 120 years, there have been six well-known international novel influenza spread. The deadliest novel influenza epidemic happened in 1918. That year the Spanish Influenza (H1N1) infected about 500 million people and caused the death of an estimated 20 – 50 million. Other novel infections similarly need to be reported and track. Two examples in the last five years are Middle East Respiratory virus and Zika virus.
I developed a Web-based reporting tool prototype to help healthcare providers in reporting communicable diseases that are required to be tracked such as novel influenza cases to authorities based on the state’s official case report form. The overarching goal was to develop and evaluate this prototype. My aims were: 1) Understanding the problems within the reportable diseases reporting process from healthcare providers to healthcare authorities , 2) Develop and test a prototype Web-based reporting tool to help improving the reporting process, and 3) Evaluate the prototype Web-based reporting tool .
The result of Aim 1 was identifying gaps between states’ reporting guidelines and states’ case report forms at individual state level and across states. The identified gaps helped to generate a collection of all the data fields used in novel influenza states’ reporting guidelines and states’ case report forms. The identified data fields were ranked based on the most used data fields across all the participated states. The ranked data fields across all the participated states helps healthcare providers and policymakers to get insight into other data fields required by other states to develop future guidelines and case report forms.
The result of Aim 2 was a tool that maps the required data from a database simulating EHRs with a different granularity of data to one or more state’s official case report forms. The tool does this through query mapping and pre-population of as much data into a given state’s case report form as the granularity of a given EHR data permit. This feature helps in reducing the manual data entry and increase the accuracy and completeness of submitted data to authorities. The tool converts the submitted case report form into Clinical Document Architecture (CDA) format, which is a recommended standard by HL7.
For Aim 3, a combination of usability evaluation methods is implemented to evaluate the Web-based reporting tool from Aim 2. The main objectives of the implemented usability evaluation methods are to measure the usability of the tool. The usability refers to the quality of a user’s experience when interacting with the tool and to measure the user's overall satisfaction. The Key finding from Aim 3 was that the Web-based reporting tool is an acceptable tool by potential users. The evaluation study generated qualitative and quantitative results. Also, the results generated a list of usability problems for future development and considerations.
Last Known Position: Assistant Professor under the College of Public Health and Health Informatics at King Saud bin Abdulaziz University for Health Sciences (KSAU-HS)
Committee: Peter Tarczy-Hornoch, Lingtak-Neander Chan, Anne Turner, Ian Painter
Design, Development, and Evaluation a Patient-Centered Health Dialog System to Support Inguinal Hernia Surgery Patient Information-Seeking
Surgery patients engage in health information-seeking activities to better understand their health conditions. An example of this activity is patients collecting data outside of the hospital to track their surgery recovery. Patients can also seek health information from resources such as clinicians, patient education materials, multimedia, friends or family members, and websites to answer their questions. However, surgery patients could encounter barriers when trying to make sense of their collected data or engaging in health information-seeking. For example, clinicians have limited availability to help make sense of the collected data or answer patient questions. Additionally, surgery patients may have low health literacy levels or have difficulties recalling their discharge teaching.
Last Known Position: Design Researcher at Microsoft
Committee: William Lober, Uba Backonja, Lingtak-Neander Chan, Heather Evans, Sean Munson
A New Perspective On Minimally Invasive Procedures: Exploring the Utility of a Novel Virtual Reality Endovascular Navigation System
Digital information is playing a larger role in the treatment of disease. Invasive procedures, such as open-heart surgery, have evolved into minimally invasive procedures that benefit from reduced trauma, scarring and recovery times. However, unlike their ancestors, minimally invasive procedures do not provide direct line of sight, and, as a result, require alternative means to depict the operative field. Modern medical images are digital representations of the operative field that are used to guide minimally invasive procedures, including endovascular procedures that occur in the blood stream. Because blood impedes light, light-based cameras, such as endoscopes, are extremely limited in their utility, requiring endovascular proceduralists to rely on non-light-based imaging. However, non-light-based imaging can be difficult to understand due to the lack of visual depth cues in their display. In this dissertation, I explored a novel method of displaying endovascular imaging through the design, development and evaluation of a head mounted display catheter guidance system. Using my system, proceduralists performing a visually complex and potentially dangerous endovascular maneuver known as the transseptal puncture performed with greater accuracy and, subjectively, a better understanding of the operative field. It is my hope that the knowledge and artifacts generated during my work influence the implementation of improved medical practices.
Last Known Position: Vice President of Engineering, Pluto VR
Committee: John Gennari, Thomas Furness, James Brinkley, Stephen Seslar
Data Mining the Electronic Medical Record with intelligent agents to inform Decision Support Systems
An intelligent agent framework is used on an ICU EMR to create prediction models for disease onset. Eleven models are created to inspect 5 diseases: acute respiratory distress syndrome (ARDS); severe acute hypoxemic respiratory failure (SAHRF); acute kidney injury (AKI); sepsis; and disseminated intravascular coagulation (DIC).
Four of the models (ARDS, AKI Stage 1, AKI Stage 2, and sepsis) are competitive or superior to the best comparable peer-reviewed models. The other seven are novel, including: SAHRF (AUC=0.952); DIC from ARDS positive patients (AUC=0.722); ARDS from DIC positive patients (AUC=0.675); AKI Stage 3 (AUC=0.983); the progression from AKI Stage 1 to Stage 2 (AUC=0.930); the progression from AKI Stage 2 to Stage 3 (AUC=0.951); and DIC (AUC=0.838).
In derivative work: a correlation between pre-DIC patients and metabolic acidosis is shown, a meta-analysis on misclassified patients is given, a disease pathway that demonstrates how ARDS and DIC can interact in a positive feedback loop is presented. DIC is shown to be implicated in 78% of all in-hospital mortality of ARDS patients.
Committee: Linda Shapiro, John Kramlich, Meliha Yetisgen, Adam Wilcox
Tsung-Chien (Jonathan) Lu
Using Smart Watches to Facilitate High Quality Cardiopulmonary Resuscitation for Patients with Cardiac Arrest
Cardiopulmonary resuscitation (CPR) quality affects survival after cardiac arrest. Past studies have shown that both healthcare professionals and laypersons often perform CPR at inadequate rates and depths, and CPR quality can be improved with adequate feedback. This dissertation sought to develop a wearable application (app) with real-time feedback by using a commercially available smartwatch to facilitate the delivery of high-quality CPR. First I conducted a systematic review on healthcare applications of smartwatches. The results find that most of the identified smartwatch studies focused on applications involving health monitoring for the elderly, and there are potential for smartwatch use in clinical settings. The second step is to develop a smartwatch app with real-time audiovisual feedback on CPR quality. By using the sensor data collected from the built-in accelerometer of the smartwatch, two novel algorithms capable of estimating chest compression rate and depth were developed and validated. User-centered design was adopted during the smartwatch interface development of the prototype and usability test was conducted for the final app. Finally, to evaluate if a smartwatch app with real-time audiovisual feedback could improve CPR quality, 80 Emergency Department (ED) professionals were recruited and randomly allocated to either the intervention group wearing a smartwatch with the preinstalled app, or to a control group. All participants were asked to perform a two-minute CPR on a manikin at 30:2 compression-ventilation ratio. The results show that chest compressions tend to be too fast and too shallow without feedback and CPR quality can be improved with feedback from a smartwatch. This work is a great example of applying modern information technology to improve the quality of healthcare. Although it is a simulation study performed on a manikin, it has great potential to be utilized in the clinical settings.
Last Known Position: MD, Taiwan
Committee: Anne Turner, Cynthia Dougherty, George Demiris, Hendrika Meischke
Patient-Peer Support to Improve Quality and Safety in the Hospital
Patient safety is a critical and persistent problem impacting health care systems around the world. Despite major financial and technological investments to improve this problem, medical errors remain a leading cause of death in the United States. As experts in the care they receive, patients offer unique insights about the source of these problems and have key roles in their prevention. However, most interventions have not included patients as equal partners in safeguarding their own care.
Peer support is one type of intervention that recognizes the valuable insights patients could provide for each other to improve the quality and safety of their care. In many other health care settings, digital peer interventions have been implemented, and have demonstrated benefits such as increased knowledge, empowerment, and self-efficacy—many factors that also influence patient involvement in safety. Yet, we know little about how peer support might translate into the context of patient safety, particularly in a hospital setting.
In this thesis, I investigate how peer support technologies can improve the quality and safety of a patient’s hospital stay. I first examine what opportunities exist for peer support in the hospital and articulate design recommendations for technologies to enable this support. I then describe my design, implementation, and deployment of a fully-functioning patient-peer support technology for the hospital setting. Finally, I show how patients used this technology and how it impacted their hospitalization. My findings reveal that peer support can be a powerful tool that equips patients with the support they need to navigate their hospital stay, and can help patients take proactive steps toward improving the quality and safety of their care.
Last Known Position: Associate Director of Outcomes Research at Merck Pharmaceuticals
Committee: Wanda Pratt, Thomas Gallagher, Ari Pollack, Andrea Hartzler
The Untold Story of Predicting Readmissions for Heart Failure Patients
The availability and accessibility of Electronic Health Record (EHR) data create an opportunity for researchers to revolutionize healthcare. The recognition of the importance of secondary use of EHR data has led to the development of research-ready integrated data repositories (IDRs) from EHR data. Analyzing this data can help researchers connect the dots and can lead to critical clinical findings through predictive analytics methods. Unfortunately, poor data quality is a problem that affects the accuracy of such findings. An example of a data quality problem is poor information about the specifics of admission, discharge, and readmission.
Heart Failure (HF) is one of the most common cardiovascular diseases. 5.7 million people in the United States have heart failure with 870,000 new cases annually, and this disease is the leading cause of hospital readmission.
Predicting readmission for heart failure patients has been well-studied. The readmission periods that researchers have studied range between 30 days to one year. However, shorter than 30 days readmission have received less research attention. In my research, I shed light on an overlooked yet important group of readmissions: very early readmissions. Currently, little is known about what causes heart failure patients to come back so quickly. In the long term, my career goal is to predict very early readmission patients before discharge and improve on the discharge decision making. It is a step toward personalized healthcare to improve patient care eventually.
The broad goal of my dissertation is to leverage the availability and accessibility of electronic health data and characterize day 1-30 readmission, more specifically characterizing very early readmissions. My approach to reach my goal went through four major steps: 1) Reviewing the literature to understand the field and how early readmission have been defined, 2) Using retrospective EHR data from UW Medicine to build an accurate visit table for heart failure patients, 3) Using the visit table to build a prediction model to characterize day 1-30 readmissions, 4) Improving on the model by applying different machine learning algorithms and imputation techniques for missing data.
Last Known Position: Assistant Professor, Al-Riyadh Governorate, Saudi Arabia
Committee: John Gennari, Shuai Huang, Peter Tarczy-Hornoch, Adam Wilcox, Annie Chen, Todd Dardas
Ontology-Driven Pathway Data Integration
Biological pathways are useful tools for understanding human physiology and disease pathogenesis. Pathway analysis can be used to detect genes and functions associated with complex disease phenotypes. When performing pathway analysis, researchers take advantage of multiple pathway datasets, combining pathways from different pathway databases. Pathways from different databases do not easily inter-operate, and the resulting combined pathway dataset can suffer from redundancy or reduced interpretability.
The Pathway Ontology (PW) is an ontology of pathway terms that can be used to organize pathway data and eliminate redundancy. I generated clusters of semantically similar pathways by mapping pathways from seven databases to classes in the PW. I then produced a typology of differences between pathways by summarizing the differences in content and knowledge representation between databases. Using the typology, I optimized an entity and graph-based network alignment algorithm for aligning pathways between databases. The algorithm was applied to clusters of semantically similar pathways to generate normalized pathways for each PW class. These normalized pathways were used to produce normalized gene sets for gene set enrichment analysis (GSEA). I evaluated these normalized gene sets against baseline gene sets in GSEA using four public gene expression datasets.
Results suggest that normalized pathways can help to reduce redundancy in enrichment outputs. The normalized pathways also retain the hierarchical structure of the PW, which can be used to visualize enrichment results and provide hints for interpretation. Ontology-based organization of biological pathways can play a vital role in improving data quality and interoperability, and the resulting normalized pathways may have broad applications in genomic analysis.
Last Known Position: Assistant Professor, UW Information School
Committee: John Gennari, Ali Shojaie, Neil Abernethy, Paul Crane
Bicluster-Based Identification of Gene Sets Through Multivariate Meta-Analysis (MVMA)
Omics technologies have transformed biology and medicine by generating massive amount of high-resolution data. Much of the data have been made publicly available but have not been fully explored or utilized. The current study aims to mine public gene expression to discover gene sets that may correspond to biological pathways. The challenges with using public data include data heterogeneity, high dimensionality, and small sample sizes. The overall research questions include: (1) what is the data mining method best suited for finding gene sets; and (2) how best to utilize multiple datasets in order to increase statistical strength. Aim 1 is to determine optimal method for constructing bicluster stacks. Aim 2 is to determine suitability of meta-analysis techniques to pool biclusters and assess performance, and Aim 3 is to assess potential utility of gene sets identified in Aim 2 using pathway analyses.
In Aim 1, we demonstrate the technique of biclustering in gene set identification, based on a number of key advantages of biclustering over the traditional clustering methods. In addition, we show that synthesis of summary statistics (biclusters in this case) is a better approach for utilizing multiple datasets compared to simply aggregating the source datasets together. For Aim 2, we adapt the framework of multivariate meta-analysis (MVMA), and a previously published two-step procedure to tackle the issue of high dimensionality with an improvement that involves a sparse estimate for the between-study covariance matrix using the graphical lasso algorithm. The improvement leads to a significant increase in the performance of MVMA in classifying real genes from background genes. In Aim 3, the gene sets found to be significant according to MVMA are further investigated by knowledge-based pathway analyses. The results suggest that the overall effect sizes are a predictor of biological relevance of the gene sets, which is the most significant finding of the study.
Last Known Position: Senior Fellow, UW Department of Anesthesia & Pain Management
Committee: Peter Tarczy-Hornoch, Brian Browning, Roger Bumgarner, Shuai Huang
The Problem of Time: Addressing challenges in spatio-temporal data integration
Across scientific disciplines, an ever-growing proportion of data can be effectively described in spatial terms. As researchers have become comfortable with techniques for dealing with spatial data, the next progression is to not only model the data itself, but also the complexities of the dynamic environment it represents. This has led to the rise of spatio-temporal modeling and the development of robust statistical methods for effectively modeling and understanding interactions between complex and dynamic systems. Unfortunately, many of these techniques are an extension to existing spatial analysis methods and struggle to account for the data complexity introduced by the added temporal dimension; this has limited many researchers to developing statistical and visual models that assume either a static state of the world, or one modeled by a set of specific temporal snapshots.
This challenge is especially acute in the world of public health where researchers attempting to visualize historical, spatial data, often find themselves forced to ignore shifting geographic features because both the tooling and the existing data sources are insufficient. Consider, as an example, a model of vaccine coverage for the administrative regions of Sudan over the past 30 years. In wake of civil war, Sudan was partitioned into two countries, with South Sudan emerging as an independent nation in 2011. This has an immediate impact on both the visual accuracy as well as the quantitative usefulness of any data generated from aggregate spatial statistics. Or, consider epidemiological case reports that are issued from local medical facilities, how does one account for the fact that their locations may change, or that new facilities may spring up or close down as time progresses. These are real-world problems that existing GIS platforms struggle to account for.
While there have been prior attempts to develop data models and applications for managing spatio-temporal data, the growing depth and complexity of scientific research has left room for improved systems which can take advantage of the highly interconnected datasets and spatial objects, which are common in this type of research. To that end, we have developed the Trestle data model and application, which leverage graph-based techniques for efficiently storing and querying complex spatio-temporal data. This system simple interface to allow users to perform query operations over time-varying spatial data and return logically valid information based on specific spatial and temporal constraints. This system is applicable to a number of GIS related projects, specifically those attempting to visualize historical public health indicators such as vaccination rates, or develop complex spatio-temporal models, such as malaria risk maps.
Last Known Position: Digital Service Expert at United States Digital Service
Secondary Usage of Electronic Health Record Data for Patient-Specific Modeling
Translational research has become an important bridge that moves findings from basic science research to patients' bedside and to the clinical community. Unfortunately, this notion of translational research seems to be unidirectional in that basic research is translated into clinical research and practice, but basic science research does not seem to benefit as much from clinical medicine.
In my dissertation, I leverage the availability of retrospective EHR data and use them with biosimulation models to translate data from clinical medicine to benefit biosimulation modeling. Biosimulation models are mathematical representations of biological systems, and they can help with mechanistic understanding of physiology and predict the dynamics of a biological system. Using clinical data with biosimulation models has the potential to benefit both the biosimulation modelers, as well as clinicians.
The abundance of retrospective clinical data available for research is a promising alternative to the traditional method of validating models by conducting resource-intensive prospective studies. These models can then be made patient-specific to simulate the physiology of individuals. When used in the clinical setting, these patient-specific models have the potential to be used by clinicians to better understand the underlying pathophysiology of the patient.
In my research, I first conduct a scoping review of model in the literature to quantify model reproducibility and discover the abysmal status of model source code availability in publications. Then using a published hemodynamics model, I demonstrate using retrospective clinical dataset from right heart catheterizations to optimize and validate the model without needing to conduct burdening prospective studies, and explore potential clinical applications of patient-specific modeling. Finally, I describe an ontological approach to extend the data-model connection to be systematic and scalable. I demonstrate this approach by connecting cardiology data and lab results data with a hemodynamics model and several nephrology models, respectively.
Last Known Position: Product Manager, Format Health
No Wrong Door: Designing Health Information Technology to Support Interprofessional Collaboration Around Child Development Work
Child development refers to children gaining the skills they need to succeed in life, consisting of abilities in different overlapping domains such as speech, motor, social, and cognition. Developmental disabilities are chronic delays in gaining such skills, and if they are not addressed in a timely manner a child can experience negative outcome throughout their life. Responsibilities for identifying and treating developmental delays and disabilities are spread across many stakeholders in the community, including not only parents but an interprofessional collection of service providers such as pediatricians, early educators, childcare providers, providers of home visiting services, and community groups. Regardless of who is involved in a child’s care, there must be ‘no wrong door’ into the ecosystem of development support services. Unfortunately, these stakeholders operate in silos, leading to a fractured system of services that parents struggle to navigate. This often leads to delays in the receipt of necessary services and uncoordinated care. Various researchers and policy leaders such as the American Academy of Pediatrics have suggested that health information technology (HIT) could be an important tool to help stakeholders collaborate in a child’s care management. Current biomedical informatics literature, however, provides little practical guidance on how to design HIT systems to support such interprofessional collaboration.
Last Known Position: Postdoctoral Research Associate at UNC-Chapel Hill
Committee: Anne Turner, Julie Kientz, Wendy Stone, Debra Lochner Doyle
Examining the Feasibility of Internet of Things Technologies to Support Aging-in-Place
The older adult population is one of the fastest growing demographic groups in the United States. Older adults face challenges such as chronic health conditions, reduced mobility, and cognitive decline. Technological solutions may be valuable resources to assist older adults in maintaining their quality of life. One such solution involves the Internet of Things (IoT) connected smart home devices. IoT smart home technologies have a unique opportunity to support healthy aging of the older adult population by identifying potential patterns in health and detecting anomalous activities. Such technologies could support detection of trends over time (for example, decrease in overall activity level, increase in sedentary behavior or reduced number of visitors) that call for intervention. This could assist older adults to maintain independence by connecting them with family members, support systems or other caregivers, and ultimately support quality of life. Despite the promise of these technologies to improve health outcomes and quality of life in older adults, there still remains a challenge in understanding older adults’ specific perceptions and concerns. This dissertation explored the feasibility of using of IoT smart home devices with older adults and understand their acceptability of these tools within their home. The specific aims of this project are to: 1) Assess the feasibility of an IoT smart home devices in their residential setting; 2) Examine older adults’ acceptability of an IoT smart sensor system and how this perception may change over time and after exposure to such a system; 3) Develop design recommendations for a future IoT smart home system to better assist older adults’ aging-in-place and maximize their user experience.
Last Known Position: Assistant Professor, Health Information Management, University of Pittsburgh; Postdoctoral Fellow, UC Davis Medical Center
On Biological Network Visualization: Understanding Challenges, Measuring the Status Quo, and Estimating Saliency of Visual Attributes
Biomedical research increasingly relies on the analysis and visualization of a wide range of collected data. However, for certain research questions, such as those based on the interconnectedness of biological elements, the sheer quantity, complexity, and variety of data may result in rather large and dense networks, rendering them visually uninterpretable. Since networks are important models in biomedicine, and since visualization is a valuable form of analysis, it stands to reason that the biomedical community may benefit from improvements to network visualization.
My dissertation focuses on the following three studies. First, I cover a semi-structured interview study aimed at uncovering the challenges researchers face while analyzing and visualizing biological networks. Second, I describe a systematic review aimed at characterizing visual attributes and assessing the ability to complete selected graph tasks in figures containing node-link diagrams obtained from peer-reviewed bioinformatics literature. Furthermore, I explain the Information Triad, a small conceptual framework I developed to reason about network visualization research questions, followed by a description of visual encoding exploration software I implemented based on the framework. Finally, I detail the design and execution of a task-centered perception study, where the saliency of several visual attributes were estimated as functions for the task of visually scanning a network.
Through these studies, I contributed to the understanding of network-related visualization challenges encountered by researchers, showed that graph figures in bioinformatics literature may be designed for varying purposes, developed a conceptual framework for reasoning about network visualization, built visual encoding software that supports systematic and reproducible explorations of the visual encoding set space, and finally, obtained an estimate of how numerous visual encodings are related to one’s ability to visually scan a network.
Last Known Position: Data & Applied Scientist, Microsoft
Committee: John Gennari (Co-Chair), Neil Abernethy (Co-Chair), Jeffrey M Heer, Abraham David Flaxman (GSR)
Last Known Position: Analyst, Palantir Technologies
Committee: Drs. Wanda Pratt (Chair), Thomas Payne, Barry Aaronson, Sean Munson (GSR)
Creating a Smartphone Application for Image-Assisted Dietary Assessment among Older Adults with Type 2 Diabetes
In the United States, the older population aged 65 or over numbered 44.7 million in 2013 and is anticipated to reach approximately 74 million people by 2030. More than one in four people in the United States aged 65 years and older have diabetes. For diabetes care, medical nutrition therapy (MNT) is recommended as a clinically effective intervention. For personalized MNT, it is essential for dietitians to assess the nutritional status of patients with a variety of dietary data (i.e., meal patterns, food choices, and overall dietary balance). However, it is difficult to obtain accurate information because traditional dietary assessment methods (e.g., 24-hour dietary recall (24HR), food records) are based on self-reported data. In particular, those methods might be inappropriate for older adults because they have special considerations with diminished functional statuses (i.e., diminished vision and memory loss). To address this problem, researchers developed and validated dietary assessment methods using the images of food items for improving the accuracy of self- reporting of traditional methods. Nevertheless, little is known about the usability and feasibility of image-assisted dietary assessment methods for diabetic older adults and their satisfaction with the methods. To my knowledge, no studies evaluated the image-assisted dietary assessment methods with both health providers (i.e., dietitians) and patients (i.e., diabetic older adults), though both are essential stakeholders in the dietary assessment process. Further, little is known about the usability and feasibility of smartphone applications for image-assisted dietary assessment, though a smartphone is the device that can perform multiple tasks (i.e., capturing, viewing, and transmitting images) required for image-assisted dietary assessment. Filling these gaps may reduce the error of self-reporting by diabetic older adults and result in more accurate dietary assessment. The goal of this research is to improve the accuracy of traditional dietary assessment methods among older adults with type 2 diabetes. To achieve the goal, I created Food Record App for Dietary Assessment (FRADA), a smartphone application for capturing, viewing, and transmitting the images of food and beverages and evaluated the usability and feasibility of FRADA and the satisfaction of diabetic older adults with the application. Further, I evaluated the satisfaction of dietitians with the image-assisted 24HR session. The findings of this research support the evidence that image-assisted dietary assessment using FRADA could be potentially used to improve the accuracy of dietary assessment by reducing the error of self-reporting. Also, this study reveals design opportunities to facilitate communications between older adults and dietitians for better dietary assessment. To my knowledge, this is the first attempt to evaluate a smartphone application with both older adults and dietitians through a lab-based and deployment study based on 24HR.
The aims of this study are:
Aim 1: To create a smartphone application for the image-assisted dietary assessment and determine the usability of the application for diabetic older adults.
Aim 2: To determine the feasibility of the smartphone application with diabetic older adults for the image-assisted dietary assessment.
Aim 3: To determine the satisfaction of diabetic older adults with the smartphone application for the image-assisted dietary assessment and determine the satisfaction of dietitians with the image-assisted 24HR session.
Last Known Position: Assistant Professor, Department of Computer Science and Engineering, University of Seoul (South Korea)
Committee: Drs. Peter Tarczy-Hornoch (Chair), George Demiris, Lingtak-Neander Chan, Mark Zachry
Using personal health records to promote patient activation in the homebound older adult population
Patient activation, or an individual’s willingness and ability to take actions to maintain their health and wellness, is a primary component of the patient-centered health system. Activated patients are more likely to report positive experiences with their medical providers, have better health outcomes, and spend less on healthcare services. Homebound older adults face more barriers to patient activation than their non-homebound peers. Because people who are homebound are unable to leave their homes without significant assistance, regularly accessing clinic-based medical services is difficult. In addition, as a population, homebound older adults have more chronic diseases, physical and cognitive impairments, and challenges with activities of daily living than non-homebound older adults.
The number of older adults who are homebound is on the rise, and they are a growing proportion of the older adult patient population. Therefore, more research is needed to understand how consumer health information tools can be used with this population to support activation and improve health outcomes. This dissertation explores the usability, feasibility, and preliminary effectiveness of personal health records with the homebound older adult patient population. In a series of studies, I outline the benefits of using personal health records with this population, assess how current personal health records meet the needs of homebound older adult users, and describe considerations for health systems and researchers who are interested in exploring personal health records for the homebound older adult population.
This work furthers our understanding of the application of personal health records in homebound older adult patient populations. In addition, I provide design recommendations on how future systems can better meet needs of homebound older adult users. Finally, I offer suggestions to help future researchers maximize the effectiveness of homebound older adult personal health record evaluations.
Committee: Drs. George Demiris (Chair), Hilaire Thompson, Anne Turner, Gary Hsieh (GSR)
Designing and Evaluating a Patient-Driven Application for Patients with Primary Brain Tumors
Primary brain tumors are a complex and challenging disease. These tumors are rare and difficult to treat, and result in a significant burden on patients and their families. These patients will experience a wide range of neurological symptoms, as well as deficits and declines in cognitive and functional abilities as they progress through the disease and treatment process. For these patients, prognosis is often poor, as recurrence is common and complete cure for most malignant brain tumors is typically not possible.
From the time of diagnosis through treatment and follow-up, patients with primary brain tumors and their caregivers face many challenges and uncertainties as they navigate the healthcare environment and take on new roles and responsibilities in the care process. Despite a recent increase in the use of personal technologies to support health-related activities, there are very few tools and technologies currently available to support the unique needs of this patient population. There has been little research conducted to study the role of technology in health and daily life for these individuals, and to explore the potential for future design and development to reflect the needs and abilities of this small and challenging patient population. These gaps represent an opportunity for research and design, leveraging the insights and experiences of current patients and caregivers in informing the design of tools and technologies to support future patients and caregivers.
In this dissertation, I investigated the experiences, challenges, and needs of patients with primary brain tumors and their caregivers and in working towards designing and developing tools and technologies to address needs surrounding tracking, understanding, managing, and communicating symptom, side effect, and other health information. I engaged patients, caregivers, and clinicians in semi-structured interviews to build an in-depth understanding of the current situation, and worked alongside patients and caregivers as partners in designing a prototype of a brain tumor specific smartphone and tablet application. I then evaluated the resulting high-fidelity prototype with patients, caregivers, and clinicians to explore functionality and usability, and further understanding of how this tool could be implemented and used to support these and future users throughout treatment and follow-up.
Last Known Position: User Researcher, Microsoft
Committee: Drs. John Gennari, Wanda Pratt, Mark Phillips, Sean A Munson (GSR)
Approaches and Strategy for Cancer Research and Surveillance Data: Integration, Information Pipeline, Data Models, and Informatics Opportunities
The advancement of cancer research, patient care and public health currently rely on acquisition of data from a variety of sources, information-processing activities, and timely access to data that is of acceptable quality for investigators, clinicians and health officials. With cancer patients living longer and undergoing multiple rounds of treatment, as well as the rise of molecular data that characterize individual patient tumors, there are challenges across all aspect of cancer data collection, integration and delivery. Although there have been advances in deployment of electronic medical records (EMRs) and use of data from EMRs and related systems to support cancer research and patient care, most data needs are still met through costly project specific manual abstraction and project specific databases.
This dissertation builds on my previous work on the Caisis cancer research database at Memorial Sloan-Kettering Cancer Center, and my assessment of trends in information technology (IT) and informatics through site visits and interviews at 60 cancer centers. My hypothesis for this dissertation was that new tools and methods from biomedical informatics could improve the availability of data for cancer research if they were applied thoughtfully and strategically. Within the context of experimenting with the application of selected informatics tools and methods in a cancer center, my overarching research question was: how can we improve access to clinical and related data about cancer patients for research?
Last Known Position: Chief of the Surveillance Informatics Branch, Surveillance Research Program (SRP) at the National Cancer Institute, Division of Cancer Control and Population Science
Committee: Peter Tarczy-Hornoch (Chair), Meliha Yetisgen-Yildiz, Sean D Mooney, Stephanie Malia Fullerton (GSR)
Chia-Ju (Cheryl) Lee
A Knowledge-based System for Intelligent Support in Pharmacogenomics Evidence Assessment: Ontology-driven Evidence Representation, Retrieval, Classification and Interpretation
A Knowledge-based System for Intelligent Support in Pharmacogenomics Evidence Assessment: Ontology-driven Evidence Representation, Retrieval, Classification and Interpretation
Abstract: Pharmacogenomics is the study of how genetic variants affect a person’s response to a drug. With great advances to date, pharmacogenomics holds promise as one of the approaches to precision medicine. Yet, the use of pharmacogenomics in routine clinical care is minimal, partly due to the misperception that there is insufficient evidence to determine the value of pharmacogenomics and the lack of efficient and effective use of already existing evidence. Enormous efforts have been directed to develop pharmacogenomics knowledge bases; however, none of them fulfills the functionality of providing effective and efficient evidence assessment that supports decisions on adoption of pharmacogenomics in clinical care.
In this context, my overall hypothesis was that a knowledge-based system that fulfills three critical features, including clinically relevant evidence, providing an evidence-based approach, and using semantically computable formalism, could facilitate effective and efficient evidence assessment to support decisions on adoption of pharmacogenomics in clinical care. My overarching research question has been: How can we exploit state-of-the-art knowledge representation and reasoning in developing a knowledge-based system with the intended features and applications as specified above.
The first aim of this research was to develop a conceptual model to address the information needs and heterogeneity problem for the domain of pharmacogenomics evidence assessment. Faceted analysis and fine-grained characterization of clinically relevant evidence acquired from empirical pharmacogenomics studies were deployed to identify 3 information entities, 9 information components, 30 concepts, 49 relations and approximately 250 terms as building blocks of the conceptual model. These building blocks were then organized into a model, which features a layered and modular structure so that heterogeneous information content of pharmacogenomics evidence could be expressed to reflect its intended meaning. The developed conceptual model was validated against a general ontology of clinical research (OCRe) to show its strength in modeling pharmacogenomics publications, studies and evidence in an extensible and easy-to-understand way.
The second aim of this research was to exploit OWL 2 DL to build a knowledge-based system that enables formal representation and automatic retrieval of pharmacogenomics evidence for systematic review with meta-analysis. The conceptual model developed in Aim 1 was encoded into an OWL 2 DL ontology using Protégé. The constructed ontology provides approximately 400 formalized vocabularies, which were used in turn to formally represent 73 individual publications, 82 individual studies and 445 individual pieces of evidence, and thereafter formed a knowledge base. After a series of subsumption checking and instance checking using HermiT reasoner, the implemented knowledge-based system was verified as consistent and correct.
The third aim of this research was to use the implemented knowledge-based system to provide four applications in pharmacogenomics evidence assessment. The first application focused on the ontology-driven evidence retrieval for meta-analysis. A total of 33 meta-analyses selected from 9 existing systematic reviews were used as test cases. The results showed that the ontology-based approach achieved a 100% precision of evidence retrieval in a very short time, ranged from 9 to 23 seconds. The second application addressed the evidence assessment of the clinical validity of CYP2C19 loss-of-function variants in predicting efficacy of clopidogrel therapy. The third application addressed the evidence assessment of the comparative effectiveness of genotype-guided versus non-genotype-guided warfarin therapy. These two applications focused on ontology-driven evidence classification to provide useful information to assist in the planning, execution, and reporting of a multitude of meta-analyses. The fourth application focused on ontology-driven interpretation of a multitude of synthesized evidence that was enabled by formal representation of synthesized evidence and typology of clinical significance in the context of assessing clinical validity and clinical utility of pharmacogenomics.
In conclusion, the major contributions of this research include: deriving an extensible conceptual model that expresses heterogeneous information content, constructing an ontology that exploits the advanced features of OWL 2 DL, and implementing a knowledge-based system that supports ontology-driven evidence retrieval, classification and interpretation. Future research would focus on (1) enhancing the system’s applicability in pharmacogenomics evidence assessment by representing evidence of other sub-domains of pharmacogenomics such as cancer drugs, and (2) expanding the system’s capability beyond pharmacogenomics evidence assessment by representing individuals’ genomic profiles and providing evidence-based interpretation based on their individual genomic profiles. With the enhanced applicability, the pharmacogenomics knowledge-based system might improve pharmacogenomics evidence assessment as well as evidence-based interpretation of pharmacogenomics at the point of care, and ultimately increase the adoption of pharmacogenomics in routine clinical care
Last Known Position: Postdoctoral Fellow, National Health Research Institutes
Committee: Peter Tarczy-Hornoch (Chair), James F. Brinkley, Emily E. Devine, John Horn (GSR)
Information extraction from clinical and radiology notes for liver cancer staging
Medical practice involves an astonishing amount of variation across individual clinicians, departments, and institutions. Adding to this condition, with the exponential pace of new discoveries in pockets of biomedical literature, medical professions, often understaffed and overworked, have little time and resources to analyze or incorporate the latest research into clinical practice. The accelerated adoption of electronic medical records (EMRs) brings about great opportunities to mitigate these issues. In computable form, large volumes of medical information can now be stored and queried, so that optimization of treatments based on patient characteristics, institutional resources, and patient preferences can be data driven. Thus, instead of relying on the skillsets of patients' support network and medical teams, patient outcomes can at least have some statistical guarantees.
In this dissertation, we focus specifically on the task of hepatocellular carcinoma (HCC) liver cancer staging using natural language processing (NLP) techniques. Staging, or categorizing cancer patients by extent of diseases, is important for normalizing patient characteristics. Normalized stages, can then be used to facilitate evidence-based research to optimize for treatments and outcomes. NLP is necessary, as with other clinical tasks, a majority of staging information is trapped in free text clinical data.
This thesis proposes an approach to liver cancer stage phenotype classification using a mixture of rule-based and machine learning techniques for text extraction. Included in this approach is a careful, layered design for annotation and classification. Each constituent part of our system was characterized by detailed quantitative and qualitative analysis regarding several medical conditions.
Last Known Position: Senior Applied Scientist at Microsoft Health; Postdoctoral Scholar, Stanford University
Committee: Meliha Yetisgen (Chair), Fei Xia,Lucy Vanderwende, Sharon Kwan, Gina-Anne Levow (GSR)
Examining the Feasibility and Acceptability of a Fall Detection Device
Falls are a very complex challenge for older adults and our health care system. They are especially dangerous when the fallen individual is unable to get up from a fall independently. This “long lie” has been shown to be almost as damaging as the fall itself and has the ability to affect not only the fallen individual’s physical health but also their mental health. Current technology designed to detect these falls are often inappropriately designed for the older adult population and are improperly used if at all.
This dissertation includes three studies that cover various aspects of older adults’ use of fall detection technology. The first study is a systematic review which assesses the current state of design and implementation of fall detection devices. The second study seeks to more clearly understand older adults’ perceptions of fall detection technology using focus groups. The third study is a feasibility study investigating the usability of a wearable fall detection device that employs innovative GPS and automatic detection technologies. I will go over the results of these studies and identify challenges associated with these devices and provide design recommendations for improving these devices.
Last Known Position: User Experience Researcher, Microsoft
Committee: George Demiris (Chair), Hilaire Thompson, Elizabeth A. Phelan, Dori Rosenberg (GSR)
Designing Wellness Tools for and with Older Adults
ver the past few decades, the use of new technologies such as computing and internet technology, has expanded rapidly. The emergence of these new technologies has created opportunities for health related uses. With the growing older adult population, there has been increased interest in using tools to support aging, health, and wellness of the older adult population. While technologies have been used with older adults for purposes such as symptom management and cognitive training, many technologies are not designed with older adults in mind. While there have been some studies that look at the usability of a single component, there have been few studies looking at a technology platform that integrates several features together. Designing specifically for older adults is important since this population has its own unique health and information needs.
In my talk, I will present my work in exploring the wants and needs of older adults for integrated health and wellness tools. I will discuss the three phases of my dissertation work including the results of focus groups seeking to understand the attitudes and preferences towards a multifunctional wellness tool, the usability issues of a popular, commercially available wellness tool, and the reactions and feedback of older adults to scenarios and storyboards showing design ideas generated after the first two phases. Results from these studies help to better understand older adults’ perceptions, attitudes and issues with potential wellness tools and inform the design of new effective and efficient systems for older adults.
Last Known Position: User Researcher, ACTIVE Network
Committee: George Demiris (Chair), Hilaire Thompson, Anne Turner, Julie Kientz (GSR)
Bayesian Networks from Ontological Formalisms in Radiation Oncology
Bayesian networks (BNs) are compact, powerful representations of probabilistic knowledge well suited to applications of reasoning under uncertainty in medical domains. Traditional development of BN topology requires that modeling experts establish relevant dependency links between domain concepts by searching and translating published literature, querying domain experts, or applying machine learning algorithms on data. For initial network development, these methods are time-intensive, and this cost hinders the growth of BN applications in medical decision making. In addition, they result in networks with inconsistent and incompatible topologies, and these characteristics make it difficult for researchers to update old BNs with new knowledge, to merge BNs that share concepts, or to explore the space of possible BN models in any simple intuitive way.
My research alleviates the challenges surrounding BN modeling by leveraging a hub and spoke system for BN construction. I implement the hub and spoke system by developing 1) an ontology of knowledge in radiation oncology (the hub) which includes dependency semantics similar to BN relations and 2) a software tool that operates on ontological semantics using deductive reasoning to create BN topologies. I demonstrate that network topologies built using my software are terminologically consistent and topologically compatible by updating a BN model for prostate cancer prediction with new knowledge, exploring the space of other dependent concepts surrounding prostate cancer radiotherapy, and merging the updated BN with a different prostate cancer BN containing cross terms with the original model. I also produce a BN to aid in error detection in radiation oncology, showing the extent to which Bayes nets are clinically impactful. Moreover, I show that the methodology developed in this research is applicable to medical domains outside radiation oncology by extracting a BN from a description logic version of the Disease Ontology.
By translating medical domain literature into ontological formalisms and developing a software tool to operate on those formalisms, I establish a novel, feasible, and useful methodology that advances and improves the creation of clinically viable Bayesian network models. In sum, my research represents a foundational component of a larger framework of automation and innovation that contributes to further application of BNs in medical decision support roles.
Last Known Position: Assistant Professor, Department of Radiation Oncology, University of Washington
Committee: John Gennari (Chair), Jason Doctor, Mark Phillips, Wolf Kohn (GSR)
Appropriating Artifacts: Understanding and Designing for Patients with a Chronic Illness
From taking medications at the right time to emotionally dealing with their symptoms, patients who have a chronic illness must manage many facets of their illness. Today, patients often utilize different types of general-purpose technologies (e.g., Facebook) to manage their chronic illness. However, many of these technologies were designed with a general user in mind—a user who does not necessarily have the same needs as one who has a chronic illness.
In this dissertation, I discuss how people from three distinct populations–health vloggers with a chronic illness, older adults who have diabetes, and children with a chronic illness–reconfigure the “everyday things” that surround them. In other words, I unpack how artifacts, relationships, roles, and technologies—the things of our daily lives—are deftly reconfigured to support chronic illness management. Drawing from these discussions, I will detail how researchers and other interested parties can design technologies that leverage this appropriation of everyday things for patients’ chronic illness management. Lastly, I expand on how we can further improve current design methodologies by designing for reappropriation when designing for and with patients who have a chronic illness. By supporting appropriation in existing general technologies in addition to newly designed technologies, we can build upon and embrace the world that those with chronic illnesses have already reconfigured.
Last Known Position: Product Design Researcher, Trunk Club
Committee: Wanda Pratt (Chair), Gillian Hayes, Julie Kientz, Sean Munson (GSR)
Using Technology to Engage People with Dementia in Recreational Activities
Dementia is estimated to currently affect almost 15% of US adults over the age of 70. As the population ages, the prevalence of dementia will increase proportionally. The increase in the number of people with dementia will create a corresponding increase in health services required. Structured activities are extremely important in this population, leading to greater well-being and greater positive affect during activities and long term effects such as delayed progression of cognitive impairments. Despite the importance of activities in dementia care, many people with dementia living outside of the community are lacking opportunities for sustained social interactions and stimulating activities. There is a clear unmet need for stimulating activities that do not place an additional financial or time burden on staff or families. Technology is a promising venue to engage people with dementia in activities. For example, technology can be used to deliver rich multimedia and standardized interventions, utilize digital archives increasing their accessibility to many, engage people in remote care or contact with loved ones, and monitor and log changes in use of the system.
In my dissertation, I examine the ways technology can support older adults with dementia in engaging in activities in a memory care unit. I discuss existing technologies that support this population in engaging in activities, a six month field deployment of an existing technology, and recommendations that have been validated with experts in the field of gerontology and human computer interaction. My dissertation furthers our understanding of how to design engaging technologies for older adults with dementia to promote meaningful participation in recreational and leisure activities.
Last Known Position: Assistant Professor, University of Maryland
Committee: George Demiris (Chair), Rebecca Logsdon, Wanda Pratt, Hilaire Thompson, Nancy Hooyman
Enhancing Health Information Gathering Experience in Online Health Communities
Online health communities can offer a range of diverse personal health expertise and experiences, yet gathering relevant health information is a significant challenge for members and researchers as each party faces different obstacles.
In my dissertation, I examine the challenges of gathering health information from online health communities in two parts according to the respective stakeholders. I first address the challenge that patient members face during their time of interaction with the online community to gather information. Within the context of computer-mediated communication in online health communities, I focus on issues associated with topic drift (i.e., topic changes) and sustainment of active participation (i.e. posting messages to participate in the communities). I also address the challenge of processing and making sense of a large amount of collective knowledge shared in online health communities. Within the context of patient-generated text in online cancer communities, I focus on the challenges of automatically understanding patient-generated text using existing natural language processing (NLP) tools.
Many members of online communities are willing to go the extra mile to help others in similar situations. Yet, many challenges hinder the experience of gathering health information from these communities.Though these efforts leave a digital trace that is embedded with diverse personal health expertise and experiences, we still lack the capability to automatically utilize this invaluable information. By expanding on existing knowledge on topic drift, sustainment of active participation, and processing patient-generated text, we can maximize the benefits of online health communities and improve patient members’experience of gathering health information.
Last Known Position: Postdoctoral Research Fellow, Biomedical Informatics, University of Utah
Committee: Wanda Pratt (Chair), Andrea Hartzler, Jina Huh, David McDonald, Meliha Yetisgen, Gary Hsieh (GSR)
Patient-Centered Development and Evaluation of a Mobile Wound Tracking Tool
Surgical site infections (SSI) are a common, costly and serious problem following surgery, affecting at least 500,000 people per year. Most infections now occur after hospital discharge, placing the burden of recognizing problems and seeking care on patients who are ill-prepared for that responsibility, resulting in reduced quality of life and preventable readmission. Yet, few efforts have been made to systematically engage patients in early identification of SSIs at home to reduce their impact.
I will describe a novel approach to addressing this problem: a patient-centered mobile health (mHealth) application that enables patients to serially track wound symptoms and photos, and securely communicate with their providers. To this end, I first present a needs assessment among surgical patients and clinicians. I then describe an iterative process of engagement with these stakeholders resulting in design considerations generalizable to post-acute care mHealth (of which wound tracking is a part). Finally, I assess the clinical value of serial wound data and photos.
My work enhances understanding of the challenges facing patients who develop post-discharge SSI, and begins to map the unexplored design space of post-acute care mHealth, especially around areas of patient-clinician conflict. In addition, I propose a new method to aid in design of patient-centered health IT and demonstrate the value of serial wound data and photos beyond existing data sources. In addition to these contributions to research, I am making an applied contribution to the development of mPOWEr, a wound-tracking tool that seeks to improve clinical outcomes and patients’ experience on the way to those outcomes.
Last Known Position: Medical Student, University of Washington
Committee: William Lober (Co-Chair), Wanda Pratt (Co-Chair), Heather Evans, Julie Kientz (GSR)
Visual Analytics Methods for Analyzing Molecular Dynamics Simulations of Mutant Proteins
The structural dynamics of proteins are integral to protein function; if these structural dynamics are altered by mutation, the function of the protein can be altered as well, potentially resulting in disease. Experimental structure-determination with x-ray crystallography and Nuclear Magnetic Resonance (NMR) can be useful in determining mutant protein structures, but detailed, high-resolution dynamics data can be difficult to ascertain. Molecular Dynamics (MD) simulation is a high temporal- and spatial-resolution in silico method for dynamic protein structure determination. Unfortunately, the data generated by MD simulations can be too large for standard analysis tools. Here I describe a novel visual-analytics tool called DIVE that was specifically created to handle large, structured datasets like those generated by MD simulations. Using DIVE, I analyzed MD simulation-data of disease-associated mutations to the α-Tocopherol Transfer Protein (α-TTP) and to the p53 tumor suppressor protein. In addition to mutant structural-analysis and characterization, I also used DIVE to develop an algorithm for identifying regions of mutant proteins that are amenable to ‘rescue’, or ligand-mediated stabilization that can suppress the destabilizing effect of mutations. The results of these investigations highlight the utility of big-data, visual-analytics approaches to exploring MD simulation data.
Last Known Position: Senior Software Engineer, Tableau
Committee: Valerie D. Daggett (Chair), Peter J. Myler, James F. Brinkley, David Beck (GSR)
Visual Analytics Methods for Analyzing Molecular Dynamics Simulations of Mutant Proteins
Ontologies have become increasingly important for both representation of biomedical knowledge and for using that knowledge to facilitate data integration. However, ontologies are generally not presented in ways that are easy for users to comprehend, which limits their use. In this work I address this problem within the context of two spatially-oriented ontologies: the Foundational Model of Anatomy (FMA) and the Ontology of Craniofacial Development and Malformation (OCDM). I describe an approach to communicating these ontologies that involves (1) identifying content patterns within an ontology, (2) creating a simplified tutorial to explain basic concepts within the ontology, (3) involving potential users in the design of an ontology browser interface, and (4) creating graphics to support the process of building and communicating the ontology. This approach should be applicable to any spatially-oriented ontology, and should result in visualizations that will enhance understanding of ontologies.
Last Known Position: Assistant Professor, University of Kentucky, Division of Biomedical Informatics
Committee: James F. Brinkley (Chair), Daniel L. Cook, Wanda Pratt, David K. Farkas (GSR)
Feature Engineering for 3D Medical Image Applications
eature engineering, including input representation, feature design, evaluation, and optimization, is essential to success in machine learning. For unstructured data like images and texts, feature engineering can often become the bottleneck in learning related tasks. Selecting the most effective and descriptive features can improve performance, proficiency, and precision in quantification applications, or enhance a good classifier in classification. Features are domain-specific, therefore, deciding what features to use and optimizing the design so that features can express input explicitly, automatically, fully, yet intuitively often require substantial knowledge of the applications and the nature of the input. This thesis introduces a new set of feature engineering algorithms for medical research of 3D CT skull images in understanding craniosynostosis disorder. Three related tasks: 1) classification, 2) severity assessment and class ranking, and 3) pre-post surgery change are used to demonstrate the effectiveness of the features and the algorithms that produce them.
Craniosynostosis, a disorder in which one or more fibrous joints of the skull fuse prematurely, causes skull deformity and is associated with increased intracranial pressure and developmental delays. In order to perform medical research studies that relate phenotypic abnormalities to outcomes such as cognitive ability or results of surgery, biomedical researchers need an automated methodology for quantifying the degree of abnormality of the disorder. While several papers have attempted this quantification through statistical models, the methods have not been intuitive to biomedical researchers and clinicians who want to use them. The goal of this work was to develop a general set of features upon which new quantification measures could be developed and tested. The features reported in this study were developed as basic shape measures, both single-valued and vector-valued, that are extracted from a projection-based plane of the 3D skull. This technique allows us to process images that would otherwise be eliminated in previous systems due to poor resolution, noise or imperfections on their original older CT scans.
We test our new features on classification tasks and also compare their performance to previous research. In spite of their simplicity, the classification accuracy of our new features is significantly higher than previous results on head CT scan data from the same research studies.
We propose a set of features derived from CT scans of the skull that can be used to quantify the degree of abnormality of the disorder. A thorough set of experiments is used to evaluate the features as compared to two human craniofacial experts in a ranking evaluation.
We study pre-post surgery change based on selected features we use in quantifying the severity of deformity of the disorder. Using the same selected features, we also compare and contrast post-surgery craniosynostosis skulls to the unaffected class.
Committee: Linda G. Shapiro (Chair), Michael L. Cunningham, Su-In Lee, Helen Sherk (GSR)
Design and Evaluation of Health Visualizations for Older Adults
The older adult population is one of the fastest growing demographic groups in the United States. Associated with this aging population are changes in health and wellness. Smart home technologies can be a valuable resource to support older adults in maintaining independence while encouraging engagement in care. To present data collected from home based monitoring including telehealth, smart homes, and other informatics tools in a meaningful manner, I describe work in the development of health visualizations for older adults. Though a body of work has shown that older adults find utility in technology to support their health and wellness, there has been limited research examining how this would translate to data visualizations. I start by looking at potential differences in how older adults process graphical information compared to the general population through a set psychophysics experiments. I then apply a user-centered design approach to iterate on health visualizations from early mockups to fully interactive prototypes. I describe different approaches for evaluating visualizations with older adults, and report on the findings of the evaluations. This work highlights key issues for how older adults use health visualizations. Based on these evaluations, I also provide a set of design guidelines when designing health visualizations for older adults.
Last Known Position: User Experience Researcher, Amazon Web Services
Committee: George Demiris (Chair), Hilaire Thompson, David W. McDonald, Cecilia Aragon (GSR)
B. Nolan Nichols
Reproducibility in Human Cognitive Neuroimaging: A Community-Driven Data Sharing Framework for Provenance Information Integration and Interoperability
ccess to primary data and the provenance of derived data are increasingly recognized as an essential aspect of reproducibility in biomedical research. While productive data sharing has become the norm in some biomedical communities, human brain imaging has lagged in open data and descriptions of provenance. The overarching goal of my dissertation was to identify barriers to neuroimaging data sharing and to develop a fundamentally new, granular data exchange standard that incorporates provenance as a primitive to document cognitive neuroimaging workflow.
For my dissertation research, I led the development of the Neuroimaging Data Model (NIDM), an extension to the W3C PROV standard for the domain of human brain imaging. NIDM provides a language to communicate provenance by representing primary data, computational workflow, and derived data as bundles of linked Agents, Activities, and Entities. Similar to the way a sentence conveys a standalone thought, a bundle contains provenance statements that parsimoniously express the way a given piece of data was produced. To demonstrate a system that implements NIDM, I developed a modern, semantic Web application platform that provides neuroimaging workflow as a service and captures provenance statements as NIDM bundles. The course of this work necessitated interaction with an international community, which adopted and extended central elements of this work into prevailing brain imaging software. My dissertation contributes neuroinformatics standards to advance the current state of computational infrastructure available to the cognitive neuroimaging community.
Last Known Position: Bioinformatics Software Engineer, Genentech
Committee: James F. Brinkley (Chair), Nicholas R. Anderson, Thomas Grabowski, Susan E. Coldwell (GSR)
Data-driven Methods and Models for Predicting Protein Structure using Dynamic Fragments and Rotamers
Proteins play critical roles in cellular processes. A protein’s conformation directly relates to its biological function and, consequently, determination of such structure can provide great insight into a protein’s function. Using a computational technique called molecular dynamics (MD), we are able to simulate and observe protein dynamics at a much higher temporal and spatial resolution than allowed by experimental methods. Dynameomics is a research endeavor that uses MD to produce thousands of protein simulations, resulting in hundreds of terabytes of data. Using novel visual analytics techniques, we have mined the Dynameomics data warehouse for data on protein backbone segments and side-chain behavior, called fragments and rotamers, respectively. Knowledge derived from these dynamic fragments and rotamers was used to improve the quality of protein loop structure predictions. We have created novel data models to store, analyze and compare fragments and side-chain rotamers, then developed methods to predict loop structures with information inferred from these data models. Protein loop regions predicted from these fragments and rotamers produce biologically relevant structures that improve upon current protein loop prediction methods. In conjunction with the fragment and rotamer research, we produced a novel visual analytics framework called DIVE, a Data Intensive Visualization Engine. This software has been instrumental in advancing our bioinformatics research, but it is a general-purpose framework applicable to a wide range of big data problems.
Last Known Position: Senior Data Scientist, PNNL
Committee: Valerie D. Daggett (Chair), James F. Brinkley, Ira J Kalet, Walter James Pfaendtner (GSR)
What Difference Does a Form Make: Redesign and Evaluation of a Form for Documenting In-Hospital Cardiac Arrest
The real-time documentation of medications and procedures is an essential part of managing patient care during in-hospital "code blue" cardiac arrest emergencies. Care providers have voiced dissatisfaction with the existing code blue documentation form. To address this problem, a mixed-methods needs assessment was used to describe the problems of usability and completeness. Based on the results, the documentation form was redesigned and then assessed through an evaluation study.
Last Known Position: Data Scientist, Microsoft
Committee: Drs. Fred Wolf, Lynne Robins, David Chou, Brian Ross, David Farkas (GSR)
Temporal Data Mining in Electronic Medical Records from Patients with Acute Coronary Syndrome
Every 25 seconds someone in the US has cardiac event and one person per minute will die from it. ST-elevated myocardial infarction (STEMI), non ST-elevated myocardial infarction and unstable angina are caused by ischemia and referred to as acute coronary syndrome (ACS). STEMI is the most severe and accounts a quarter of ACS cases. There is substantial research in STEMI treatment that focuses on a single event and the risks/benefits thereof. The interaction between events during an encounter is especially important in STEMI, where the timing of treatments is crucial for positive patient outcomes. However there is a dearth of research into the relationship between events.
To explore the temporal relationships, I created a sequential pattern mining algorithm (SPM) and a temporal association rule mining algorithm (TARM) to mine the Acute Coronary Syndrome Patient Database (ACSPD). The ACSPD is a very large, 9-year EMR database derived from 128 health care institutions across the US. The SPM is well-suited to extract patterns from noisy data. The TARM is designed to discover rules comprised of 3 temporally ordered events, i.e. clinical practice patterns (CPP).
Using the SPM in the ACSPD, I discovered 39 order sets. Not all order sets are present for the 9 year span and overall order set use drops precipitously in 2004. I postulate that this denotes a shift in medical practice. The cause is unknown, but in late 2004, the American Heart Association (AHA) published new STEMI treatment guidelines. I condensed the ACSPD sequences using the order sets then applied the TARM. Using support, confidence, lift, likelihood, and Zhang’s, I found substantial variation, rarity and weak antecedent-consequent pairing in the CPPs. To explore the interaction between clinical decisions and patient outcomes, I compared the CPPs with AHA STEMI performance measures for compliance and analyzed the risk of bleeding and mortality. CPP compliance with performance measures decreases mortality and bleeding risk, but there is evidence of complex interactions between measures that augments or masks the effect. The contributions of this work are 1) exploring CPPs and their effect on patient outcomes and 2) the novel combination of sequential and temporal association rule mining in EMR data.
Last Known Position: Merck Pharmaceuticals
Committee: Drs John H. Gennari (Co-Chair), Meliha Yetisgen-Yildiz (Co-Chair), Eric J. Horvitz, Tyler Harris McCormick (GSR)
Rupa A. Patel
Designing for Use and Acceptance of Tracking Tools in Healthcare
Patients with cancer experience many unanticipated symptoms and struggle to communicate them to clinicians during treatment. They contend with a variety of symptoms at home—issues stemming from cancer progression, treatment regimens, and co-morbidities. Although many patients rely on clinic visits to get help with managing these symptoms, clinicians often underestimate the intensity of patients' symptoms or miss them altogether. A proliferation of mobile and sensor-based tools, which enable self-tracking, leads us to consider how to approach their design to support cancer symptom management.
However, tracking tools are not widely used and accepted in cancer care. To further study use of tracking tools, I analyzed the use of two different types of manual tracking tools: (1) ESRA-C2, an electronic Patient-Reported Outcome (ePRO) tool deployed to 372 people with cancer; and (2) HealthWeaver, a personal informatics tool deployed as a technology probe to 10 women with breast cancer. Also, I analyzed the “in-the-wild” self-tracking practices of the 10 women before they used HealthWeaver, as well as 15 other women with breast cancer. Results showed that patients who voluntarily used the ePRO tool the most frequently had relatively low symptom distress. In addition, although patients’ tracking behaviors “in the wild” were fragmented and sporadic, these behaviors with a personal informatics tool were more consistent. Participants also used tracked data to see patterns among symptoms, feel psychosocial comfort, and improve symptom communication with clinicians. Given these considerations, I describe a new conceptual model that has implications for patients, clinicians, and tool developers. If patients and clinicians accept and integrate tracking tools into cancer symptom management away from the clinic, we can move closer to continuous healing relationships that are the cornerstone of effective care.
Last Known Position: Senior UX Researcher, GoDaddy
Committee: Wanda Pratt (Chair), Thomas H. Payne, Paul Gorman, Donna L. Berry, Emily E. Devine (GSR)
Ontology Based Data Integration of Open Source Electronic Medical Record and Data Capture Systems
n low-resource settings, the prioritization of clinical care funding is often determined by immediate health priorities. As a result, investment directed towards the development of standards for clinical data representation and exchange are rare and accordingly, data management systems are often redundant. Open-source systems such as OpenMRS and OpenClinica provide an opportunity to leverage available systems to improve standards and increase interoperability. Nevertheless, continuity of care and data sharing between these systems remains a challenge, particularly in populations with changing health needs, and inconsistent access to health resources.
The overarching goal of this project is to enable sharing of data across low cost systems like OpenMRS and OpenClinica using ontologies. The project consists of three aims: 1) describing clinical research and visit data related to the treatment and care of HIV/AIDS patients, 2) developing a prototype data integration system between electronic medical record and electronic data capture systems, and 3) evaluating the utility of the prototype system using simulated and real-world data. In the first aim, I developed a patient identifier and a HIV/AIDS treatment and care ontology to represent the types of data and information created and used by clinicians. This was achieved by gathering data forms used in HIV/AIDS clinics in low-resource settings. From these forms, the patient identifier and HIV/AIDS variables were extracted and used to create the ontologies. In aim 2, the ontologies from aim 1, along with simulated data, were used to develop a prototype data integration system that improves the ability of developers to implement integration systems that meet the needs of users, based on previously created use cases. In the third aim, I evaluated whether the matching algorithm used in the prototype can correctly identify matching patients, and whether the prototype is generalizable to clinical care and research data collected in a real world setting.
This work contributes two ontologies to the medical and public health fields that are useful in providing standardization of data elements. Additionally, I provide a prototype data integration system that is useful in facilitating access to previously siloed data and helps reduce the burden of integrating future systems.
Last Known Position: HIT Coordinator, University of Louisiana at Lafayette and the Louisiana Department of Health and Hospitals
Committee: James F. Brinkley (Chair), Neil Abernethy, Judd L. Walson, Mark P. Haselkorn (GSR)
A Graph-Theoretic Approach to Model Genomic Data and Identify Biological Modules Associated with Cancer Outcomes
Studies of the genetic basis of complex diseases present statistical and methodological challenges to discover reliable and high-confidence genes that reveal biological phenomena underlying the etiology of the disease or gene signatures prognostic of disease outcomes. This thesis examines the capacity of graph-theoretical methods to integrate and analyze genomic information and thus facilitate using prior knowledge to create a more discrete and functionally-relevant feature space. To assess the statistical and computational value of graph-based algorithms in genomic studies of cancer onset and progression I apply an instance of a random walk graph algorithm in a weighted interaction network. I merge high-throughput co-expression and curated interaction data to search for biological modules associated with key cancer processes and evaluate significant modules by their predictive value and functional relevance. This approach identifies interactions among genes involved in proliferation, apoptosis, angiogenesis, immune evasion, metastasis, and energy metabolism pathways that generate hypotheses for further cancer biology studies. Results from this analysis show that graph-based approaches are a powerful tool to integrate and analyze complex molecular relationships and to reveal coordinated activity of significant genomic features where previous statistical and analytical methods focusing on individual effects are limited.
Last Known Position: Data Scientist Lead, TensorloT Inc
Committee: Neil F. Abernethy (Chair), John H. Gennari, Ira J. Kalet, Ali Shojaie, Barbara E. Endicott (GSR)
Mabel Raza Garcia
A Proof of Concept System for Automated Cervical Cancer Screening in Peru
Cervical cancer is the second most frequent cancer in women around the world and affects half a million women per year. The World Health Organization (WHO) estimates that 275,000 women die every year, and 80% to 85% of these deaths occur in low-resource countries in Africa and South America. In Peru, cervical cancer has the highest incidence and the second highest mortality rate of cancers among women. Currently, the screening techniques such as the Papanicolau (Pap) test, in which some cells from the cervix are examined under a microscope to detect potentially pre-cancerous and cancerous cells, and the Visual Inspection with Acetic Acid (VIA), in which the surface layer of the cervix is examined through visual inspection after washing it with 3% to 5% acetic acid (vinegar) for one minute, are part of the national health policy in Peru. The Pap test is mainly used in urban areas in Peru. However, there are some challenges related to spreading the Pap test throughout the whole country: lack of quality and standardization of the readings of Pap smears, shortage of trained personnel, uneven processing of samples resulting in diagnosis and treatment delays, and lack of even basic laboratory infrastructure, all of which impacts greatly the sustainability of this procedure in remote and/or poor settings.
Extensive research has shown that computational solutions are a viable and suitable aid for overcoming these barriers. However, the majority of these solutions are commercial products that are not affordable for developing countries, such as Peru. In this context, developing a strategy, algorithm and open source computational implementation that recognizes normal vs. abnormal Pap smears can ultimately provide a cost-effective alternative for developing countries. The dissertation-specific objectives are to: 1) determine the characteristics of normal vs. abnormal Pap smears through expert consultation and relevant literature, 2) collect data sets and run preliminary experiments to compare two possible approaches, and 3) assess the accuracy, sensitivity and specificity of the proposed cervical cancer screening approach for classifying normal vs. abnormal Pap smears compared to experts’ review.
Last Known Position: Professor, Universidad Peruana Cayetano Heredia
Committee: Linda G. Shapiro (Chair), Sherrilynne Fuller, Ira J. Kalet, William B. Lober, Joann G. Elmore (GSR)
Secondary Use of Clinical Data: Barriers, Facilitators and a Proposed Solution
The increasing adoption of electronic medical records is producing a massive accumulation of routinely collected electronic clinical data (ECD). This data can be used not only for direct patient care but for secondary purposes such as clinical research, quality improvement and public health. However, using clinical data collected for one purpose does not render it usable for secondary purposes. This dissertation seeks to explore (1) whether ECD is fit for use for research purposes, (2) the barriers and facilitators to secondary use faced by clinical researchers, and (3) to propose a solution to help address one of the barriers identified. To do this, this dissertation is composed of three different but interrelated studies. The first one consists of a Delphi process to develop a tool to systematically assess the fitness for use of ECD for research and its subsequent application on a set of clinical data requests. The second study is a qualitative inquiry into the barriers and facilitators to secondary use of clinical data experienced by researchers at the University of Washington, Group Health Research Institute and the Veterans Affairs' Northwest Center for Outcomes Research in Older Adults. The third study describes the development of a system to query clinical relational databased based on temporal abstractions and patterns, which should enable researchers to identify high-level concepts from clinical databases. The results of this dissertation should allow us to improve the reutilization of ECD for research purposes.
Last Known Position: Associate Professor, University of Melbourne; Affiliate Assistant Professor UW Department of Biomedical Informatics and Medical Education
Committee: Peter Tarczy-Hornoch (Chair), Ira Kalet, Branko Kopjar, James Tufano, Susan Heckbert (GSR)
The Synthetic Biology Open Language: A Data Exchange Standard for Biological Engineering
Synthetic biology is the emerging research and engineering field concerned with the design and construction of new biological functions and systems. Synthetic biologists are engineering organisms to solve outstanding problems in medicine, bio-energy, environmental health, and nutrition. Their goal is to improve the biological engineering process by applying standardization, decoupling, and abstraction. To more efficiently engineer gene circuits synthetic biologists need software tools that support standardized data exchange.
For my dissertation research I led the development and deployment of the Synthetic Biology Open Language (SBOL). In this dissertation, I present the SBOL community, the specification, and demonstrations of its use. The SBOL community is supported by stakeholders from the synthetic biology software community. The SBOL Core specifies the vocabulary, data model, and format to define the standard. I describe SBOL Core as a common representation for synthetic biology designs capable of describing theoretical DNA component designs; annotated DNA sequence; and collections of components. To aid the exchange synthetic biological designs among software tools I explain the software libraries which support the implementation of SBOL. Then, I illustrate the recognition of its value and acceptance by the stakeholders through the deployment of the technology at collaborating sites. Finally, I show how the choice of Semantic Web technology to facilitate the information exchange between software can also be used for information retrieval to improve the selection of DNA components in new designs. Through this work I contribute to the development of informatics standards a computational infrastructure to enable a rapid biological engineering process for biotechnology.
Last Known Position: Scientist, Arzeda Corporation
Committee: John Gennari (Chair), Daniel Cook, Herbert Sauro, Georg Seelig (GSR)
Walter H. Curioso
Evaluation of a Computer-Based System using Cell Phones for HIV positive people in Peru
HIV is one of the biggest infectious killers worldwide. To prevent disease progression and avoid development of resistant strains to HIV, people living with HIV must adhere to complicated antiretroviral therapy (ART). Yet, in Peru, where ART has recently been introduced, adherence to HIV treatment has not yet been addressed properly, and no systematic approaches to evaluate or promote adherence to ART exist. For people living with HIV, innovative approaches using information technologies, such as mobile phones, are needed to increase adherence to ART. In my thesis, I proposed the following specific aims: (1) To conduct formative research to assess culturally-specific behavioral messages to be included in the computer-based system; (2) To develop and test an interactive computer-based system using cell phones both to enhance adherence to ART and to deliver HIV transmission risk reduction messages; and (3) To evaluate the impact of the system on ART adherence. To achieve these aims, I conducted a randomized controlled trial of a 12-month intervention, comparing (1) standard-of-care with (2) standard-of-care plus my mobile phone-based system among patients receiving ART at Via Libre, a non-governmental organization established to help people with HIV, and Hospital Nacional Cayetano Heredia, a governmental hospital; both in Lima, Peru. This novel trial adds important evidence to the field of mHealth—the provision of health-related services via mobile communications. The trial is potentially scalable as a prevention strategy by the Ministry of Health, and the results could be applied in other settings, not only for ART, but also to encourage patients to follow long-term treatment plans for other chronic diseases. Furthermore, because the intervention is automated using available information and communication technology, it can be scaled up widely without requiring proportionate and expensive staff resources.
Last Known Position: Affiliate Assistant Professor UW Department of Biomedical Informatics and Medical Education; Viceministro de Políticas y Evaluación Social del Ministerio de Desarrollo e Inclusión Social MIDIS en Ministerio de Desarrollo e Inclusión Social
Committee: Wanda Pratt (Chair), George Demiris, James D. Ralston, Ann E. Kurth (GSR)
Development and Evaluation of a Web-based Electronic Medical Record System Without Borders
Despite implementation of electronic medical record (EMR) systems in the United States and other countries, EMRs often lack global access, standardization, an efficient interface, and effective knowledge-based tools at the point of care. Consequently, the information needs of patients, practitioners, administrators, researchers, and policymakers often go unmet, leaving providers especially dissatisfied. To address this multifaceted problem, a novel EMR design referred to as “Electronic Medical Records Without Borders” (EMR WB) was created to ensure that the most vital pieces of patient clinical records are available to make health care decisions. A web-based, standardized, family medicine, clinical history model was developed and evaluated as an EMR clinical core that integrates state-of-the-art terminology, peer-reviewed, evidence-based protocols with real-time access to diagnostic decision support systems and the biomedical literature, using a unified, navigable, intuitive computer interface. This project aimed to facilitate structured clinical documentation, usability, global access, and decision-making processes to better address not only local, clinical, and psychosocial primary care problems in targeted underserved global communities, but also to mitigate transnational migration health issues based on information exchange among primary care settings. A survey of post-exposure EMR WB use indicated a measurable, positive effect was made on provider satisfaction compared with a previously used, paper-based record system. Data were analyzed using descriptive statistics.
Last Known Position: Co-founder and Executive Director, ConstaTec Solutions, LLC
Committee: Fredric M. Wolf (Co-Chair), Meliha Yetisgen-Yildiz (Co-Chair), Barry Aaronson, Craig S. Scott, Kelly Alison Edwards (GSR)
Towards an Extensible Atlas-Based 3D Visualization Framework for Biomedicine: Biolucida
Today’s biomedical research endeavors often entail exposure to a daunting amount of expressive data which must be effectively comprehended and communicated. It has been well established that visualization is a powerful conduit for analyzing, representing, and communicating such information. Since these data are often associated with structures of anatomy that can be represented as computer-generated 3D models, the biomedical atlas can be leveraged as an effective visualization motif. This style of presentation not only conveys spatial relationships among the data, but also provides a natural representation which is easily understood by a wide audience. Many systems have been produced which can be considered computer-based biomedical atlases, some of which have been used to visualize data and to illustrate spatially complex concepts. However, the development of these systems has been costly due to the fact that many, while similar in features and design, have been built from the ground up as single-purpose applications. A next-generation visualization system, Biolucida, has been developed as a generalizable framework which is designed to meet the visualization needs of the biomedical community through its comprehensive feature set and its extensible architecture.
Last Known Position: Principal, Plexseer Group
Committee: James F. Brinkley (Chair), Linda G. Shapiro, Suzanne J. Weghorst, Timothy L. Nyerges (GSR)
Information Needs and the Characteristics of Population Data Sources: An Immunization Information System Case Study
Data and information are vital to the daily work of public health practitioners and the data they use come from a variety of sources. Examples of these data sources are vital statistics databases, surveillance data, morbidity data, and Immunization Information Systems (IISs). These IISs are of particular interest because of their near ubiquity in the Unites States, their importance for public health practice, and their most basic function of providing cross-organizational access to immunization-related clinical data for both public and private health care providers. As the infrastructure to connect electronic health record (EHR) systems and public health systems expands, public health practitioners will have the opportunity to access an unprecedented volume of patient level clinical information. The flood of information and data will have the greatest public health impact if understood and organized within the framework of public health practitioners' data and information needs. This work uses qualitative methods to identify and understand the information needs of public health practitioners related to immunization work and the data and information source characteristics that are important in meeting those needs. This study also uses quantitative methods to describe two important data source characteristics in Washington’s IIS: timeliness and data element completeness. Results point to three main types of information needs of public health practitioners: individual level, population level and context-specific information (vaccine-specific information in this case). These results further the understanding of information work in public health across local and state public health organizations. These results also provide solid evidence related to the effect of different methods of data transfer on data quality. In addition, synthesis of the qualitative and quantitative components provides evidence to support a set of recommendations presented to state level stakeholders in Washington. This research will help inform the development of technical and non-technical infrastructure to support data sharing between healthcare providers, health information exchanges, and public health organizations.
Last Known Position: Clinical Assistant Professor, Biobehavioral Nursing and Health Systems, University of Washington; Research Coordinator, Northwest Center for Public Health Practice
Committee: William B. Lober (Chair), Neil F. Abernethy, Rita A. Altamore, Diane P. Martin, Debra Revere, William E. Welton (GSR)
Hao (Maya) Li
A Model Driven Laboratory Information Management System
Biomedical research scientists need more robust tools than spreadsheets to manage their data. However, no suitable laboratory information management systems (LIMS) are readily available; they are either too costly to build or too complex to adapt. This thesis presents the architecture, design, implementation, and a prototype of a model driven LIMS, called Seedpod. Scientists, with the help of biomedical informaticists, develop a knowledge model of their data and data management needs in Protégé. Seedpod then automatically produces a relational database from the model, and dynamically generates a web-based graphical user interface. Seedpod can be used for multiple scientific research domains since only its knowledge model contains domain-specific content. It decreases development time and cost, thereby allowing scientists to focus on producing and collecting data.
Last Known Position: Principal Software Engineer, Helix
Committee: James F. Brinkley (Chair), Ira J. Kalet, Linda G. Shapiro, Dan Suciu (GSR)
A Clinical Decision Support Model for Incorporating Pharmacogenomics Knowledge into Electronic Health Records for Drug Therapy Individualization: A Microcosm of Personalized Medicine
Personalized medicine, where treatment may be tailored to individual characteristics, has the potential to improve patient outcomes. As a microcosm of personalized medicine, findings from pharmacogenomics (PGx) studies have the potential to be applied to individualize drug therapy such that the efficacy is improved and the occurrence of adverse drug events are reduced. In this context, the overarching research question this research project aimed to address was: what needs to be done to incorporate PGx knowledge into an electronic health record (EHR) in a useful way that facilitates drug therapy individualization? Clinical decision support (CDS) imbedded in the EHR was investigated as a model for providing access to PGx knowledge to support accurately using and interpreting patient genetic data to individualize drug therapy. The aims of this research were: (1) characterizing PGx knowledge resources; (2) determining capabilities of current CDS systems; (3) developing a prototype implementation of a model for PGx CDS; and (4) evaluating the utility of the PGx CDS model implementation. Findings from this work enhances our understanding of how PGx knowledge should be made accessible via CDS in the EHR given characteristics of PGx knowledge, technical capabilities of current clinical systems and characteristics of clinicians. More generally, the results of this study contribute a model that is directly applicable to the incorporation of genetic and molecular data into EHRs and its usability by healthcare providers.
Last Known Position: Assistant Professor, Johns Hopkins University
Committee: Peter Tarczy-Hornoch (Chair), Emily E. Devine, David Fenstermacher, Ira J. Kalet, Kenneth E. Thummel, Kelly Fryer-Edwards (GSR)
Notifiable Conditions Information Systems in Local Public Health Practice: Applied Informatics Research
Notifiable conditions reporting is an essential component of public health surveillance. Through this process local public health jurisdictions (LHJ) collect information about health events of interest and share this information with state-level public health departments. Many LHJs make use of electronic information systems to manage, process, and analyze the notifiable conditions data. In the midst of state and national-level efforts to standardize notifiable conditions reporting processes, there has been a nation-wide push for LHJs to adopt new notifiable conditions information systems that are capable of online reporting. Offering the benefit of faster reporting to state public health departments, and compliance with new standardization efforts, these systems may not be designed to accommodate the specific work practices that are unique to each local public health jurisdiction. The implementation of a new information system in an LHJ may disrupt the work that is required to properly address the health issues that are unique to the region. This could have serious effects on local public health practice.
This study aims to improve the development and evaluation of notifiable conditions information systems that support the work of local public health jurisdictions through three main efforts. 1) To describe the use of information systems in local public health practice, communicable disease information management activities were observed at a large municipal public health agency. Participant observation and task analysis were used to describe the work of local public health practitioners. 2) An online survey was developed and distributed to local public health practitioners in Washington State. Employees were asked about their work practices and interactions with information management systems. Descriptive statistics were used to compare the usage of information systems across LHJs of differing size. 3) An evaluation strategy for local public health agencies was developed to assess the usefullness of information systems within their working environment. A guidebook describing the strategy was written and shared with local public health practitioners.
The findings from this study provide new knowledge which can be used to inform the design and evaluation of notifiable conditions, communicable disease, and outbreak management software.
Last Known Position: Director, Public Health Informatics Program, RTI International; Adjunct Professor, Emory University
Committee: Neil F. Abernethy (Chair), George Demiris, Mark W. Oberle, Lisa A. Jackson (GSR)
Mining Mountains of Data: Organizing All Atom Molecular Dynamics Protein Simulation Data into SQL
A significant portion of my research has involved organizing all atom molecular dynamics protein simulation data into a form that is both manageable and is conducive to analysis. These data consist of multi-gigabyte collections of four-dimensional atomic coordinates (x, y, z and time) and secondary analyses, as well as classification data used to select and organize the proteins for simulation. The initial database design was released in 2007 and published in 2008 as the Dynameomics Data Warehouse1, and has been in continuous development to accommodate an ever increasing number and length of simulations. The Consensus Domain Dictionary2 (CDD), released in 2010, defines a rank ordered set of globular proteins that sample the most frequently occurring protein folds found in the Protein Data Bank. Andrew's defense presented the CDD database, the dimensional model at the core of the data warehouse, and a novel method for optimizing queries involving spatial data stored in relational tables.
Last Known Position: Principal Informatics Scientist, Cognitive Medical Systems, Inc.
Committee: Valerie D. Daggett (Chair), James F. Brinkley, Ira J. Kalet, Peter J. Myler, Thomas R. Quinn (GSR)
A Study of Low-Income Health Care Consumers: Motivations for Using Electronic Personal Health Record Systems
Health care consumers have different motivations and needs for managing their detailed medical history as well as health information to support their healthcare-related decisions. Electronic Personal Health Record systems are a form of tool that helps health care consumers collect, manage and use their health information. Despite the fact that many types of PHR systems have become available to various groups of consumers, the motivations to utilize PHRs and the barriers to widespread adoption have proven difficult to measure. In this research, I explore and define the factors that motivate individuals’ decisions on whether to adopt a PHR system.
I chose a grounded-theory-based qualitative methodology to identify and explore these factors in a setting where a PHR had been available for one and a half to three years to a group of low-income individuals. Demographics of this group included elderly and disabled individuals, many of whom had multiple co-morbidities that result in complex health information management needs.
The end results of this work are two frameworks created from the health care consumer or patient-driven perspective. (1) The Levels of Interest in Health Information Management Framework (LIHIMF) can be used to categorize potential adopters to help create personas and tailored approaches to designing and implementing PHR systems. This framework describes three types of potential PHR adopters by their willingness to manage their health information or use a PHR. (2) The Health Information Management Motivational Factors Framework (HIMMFF) is a comprehensive framework of issues that contribute to PHR adoption. Factors that motivate or discourage adoption as described by both PHR users and non-users are grouped into seven categories. These frameworks can be used by the PHR and health information management research community to better understand and further study PHR adoption.
This work contributes an approach to understanding patient information management needs from the patient-driven perspective. Furthermore, it advances our understanding of how information systems impact health information management in underserved populations.
Committee: Michael Eisenberg (Co-Chair), Wanda Pratt (Co-Chair), Nicholas R. Anderson, Donna L. Berry, Brenda Zierler (GSR)
Computational Methods for the Analysis of Molecular Dynamics Simulations
Proteins are macromolecules that are involved in virtually every biological process and structure. The three-dimensional structure of these molecules is extremely important as a window into how they work but is extremely difficult to predict, as direct observation of their motion and the folding pathway is possible only through very limited experimental techniques. Nonetheless, observing protein structure alone has proven insufficient for understanding how proteins fold or behave natively. Molecular dynamics (MD) is a computational technique by which protein dynamics can be examined at resolutions well beyond the capabilities of experiment. The decrease in cost of computer resources have lead biologists to turn to MD more frequently in recent years, yet MD simulations produce data in quantity and complexity well beyond the capabilities of conventional biological analysis techniques. We have curated a database of protein native-state and thermal unfolding simulations, which is the largest database of unfolding simulations to date. We examine this database using two existing and three novel analysis methods and demonstrate the utility of each for high throughput analysis. Finally, we demonstrate that these methods can be used to generate and support novel hypotheses concerning protein motion.
Last Known Position: Postdoctoral Fellow, University of Pennslyvania
Committee: Valerie D. Daggett (Chair), James F. Brinkley III, Peter J. Myler, Walter L. Ruzzo (GSR)
A Cognitive Work Analysis of Physician Order Entry in Pediatric Inpatient Medicine Teams
Computer Physician Order Entry (CPOE) systems have been shown to save time, streamline processes and reduce medication prescribing errors and adverse drug events. However, CPOE remains a poorly adopted technology in most United States hospitals. Clinical work is known to be interruptive, multitasking, collaborative and distributed yet current CPOE systems emphasize linear, normative and solitary work. To study this work-technology disconnect, I performed a qualitative field study that included document collection, observations and interviews of pediatric inpatient physicians working in teams. I identified emerging physician work themes through inductive analysis. I systematically characterized the larger contexts in which ordering occurs by deductively analyzing these data using Cognitive Work Analysis (CWA) - a holistic systems analysis framework that characterizes work by identifying constraints on work at multiple levels from the work environment to the worker. Through these combined results, I identified and will present design implications for future CPOE systems that can support flexibility, cooperation and adaptation to unanticipated work situations before they become sources of medical errors.
Last Known Position: COO, SOSV partner, Orbit Startups
Committee: John H. Gennari (Chair), Mark A. DelBeccaro, Thomas H. Payne, Raya Fidel (GSR)
Modular, Semantics-based Composition of Biosimulation Models
Biosimulation models are valuable, versatile tools used for hypothesis generation and testing, codification of biological theory, education, and patient-specific modeling. Driven by recent advances in computational power and the accumulation of systems-level experimental data, modelers today are creating models with an unprecedented level of complexity. These researchers need tools that manage this complexity and scale across biological levels of organization and physical domain. Historically, many industries have addressed the issue of complexity by adopting a modular product design. In order to apply this approach to the field of biosimulation, existing models must be cast as interoperable components. However, modelers today use a variety of simulation languages so that interoperability is the exception rather than the rule.
For my dissertation research I have worked on the challenges of modularity and interoperability within biosimulation. I helped develop a modular, multi-scale, multi-domain modeling approach called SemSim that provides broad model interoperability. The SemSim approach includes a declarative model description format that can capture the computational and semantic information in existing legacy models, thereby converting them into interoperable, reusable components. Because they interoperate at the semantic level, SemSim models offer opportunities to automate common composition and decomposition tasks beyond currently available methods. For my dissertation project I created and tested a software tool called SemGen that helps automate the modular composition and decomposition of SemSim models. With this tool, users can 1) convert legacy models into the SemSim format and annotate them with semantic data, 2) automatically decompose SemSim models into interoperable sub-models, 3) semi-automatically merge SemSim models into larger systems, and 4) encode SemSim models in an executable simulation format. As a proof-of-concept demonstration of modular modeling, I used SemGen to perform a set of model composition and decomposition tasks using models of hemodynamics, neural signaling, molecular diffusion, and chemical pathway kinetics. This demonstration establishes SemGen’s capabilities for automating the modular composition and decomposition of biosimulation models across physical scales and physical domains. Thus, SemGen has the potential to advance the entire field of biosimulation by spurring the development of complex models for biological research, drug target identification, and patient-specific modeling.
Last Known Position: Computational Biologist, Center for Infectious Disease Research
Committee: John H. Gennari (Chair), James F. Brinkley III, Daniel L. Cook, Herbert M. Sauro (GSR)
Characterizing Information Needs for Public Health Continuity of Operations: A Scenario-Based Design Approach
Public health field nurses play a critical role in the community during disasters and emergencies. Continuity of operations planning (COOP) is a recognized part of any emergency management strategy and technology should support the elements of public health COOP through support of routine work activities.However, the work of public health field nurses is characterized by multiple, disparate digital and paper-based information systems that require duplicate data entry, reduce efficiencies in the performance of daily work and create issues during emergencies.
This research project characterized the information needs of public health nurses and nurse supervisors through three specific aims. The first aim consisted of an information needs assessment through a systematic literature review for technology support of public health continuity of operations planning and semi-structured interviews with public health practitioners in two local health jurisdictions. The second aim used scenario-based design and persona creation to develop a conceptual design of an integrated information system that supports the work of public health nurses and nurse supervisors. The third aim used focus groups with public health nurses and other public health staff to validate the information system design in both local health jurisdictions.
Focus group participants validated the conceptual information system design in the following thematic areas: The need for a dynamic, flexible system, support for client service and documentation, workload tracking, staff management, one-time data entry, real-time documentation, communication and data exchange between divisions, integrated scheduling and communication with external providers. Focus group participants corrected perceived errors in design and made additional design recommendations.
The results of this research highlight the importance of involving public health practitioners in the design process for technology that supports their information needs and work activities and can support them during emergencies. In addition, this research shows it is possible to validate and reuse design concepts across local health jurisdictions that have different organizational structures. Reusable design knowledge is an important goal for public health informatics efforts to increase efficiencies through support of standard work practices and reduce the costs of information system projects.
Last Known Position: Associate Professor, University of Missouri, Sinclair School of Nursing and MU Institute for Data Science and Informatics
Committee: George Demiris (Chair), John Hartman, Mark Oberle, Anne Turner, William Welton (GSR)
A Rule-Based Strategy for Accurately Describing Gene Content Similarities and Differences Across Multiple Genomes
A fundamental tasks in genome research is that of comparing gene content between multiple genomes. In infectious disease research such comparisons are critical for determining the underlying parasite genetic factors that are responsible for disease transmission, pathogenicity and clinical outcome. Although numerous technologies exist for comparing gene sequences and grouping similar genes, the genomics field lacks structured methods for describing the complicated evolutionary dynamics that give rise to the differences between the compared species. In this dissertation I put forth novel technologies for accurately and precisely describing differences in gene content across multiple genomes.
First, I introduce a light-weight knowledge representation specification that allows us to aggregate gene annotation and sequence comparison data from heterogeneous sources. Next, I describe a new ontology for describing pairwise homology relationships between genes, as well as a rule-based system for applying those terms to sequence comparison results. I then detail a novel method for grouping genes based on the nature of their homology relationships. Finally, I present a technique for querying the gene groups in order to uncover interesting evolutionary trends across the compared genomes. These methods represent a significant advance in the clarity and detail with which large scale comparative genomics can be described; furthermore, the novel techniques that I present in this work are amenable to integration with existing sequence comparison and clustering technologies.
Last Known Position: Senior Manager - Special Programs, Intellectual Ventures
Committee: Peter J. Myler (Chair), Roger E. Bumgarner, John H. Gennari, Walter L. Ruzzo (GSR)
Sharing by Design: Understanding and Supporting Personal Health Information Sharing and Collaboration within Social Networks
Friends, family, and community provide important support and help to patients who face an illness. Unfortunately, keeping a social network informed about a patient’s health status and needs takes effort, making it difficult for people who are sick and exhausted from illness. Members of a patient’s social network are often eager to help, but can be unsure of what to do; they must balance their desire to help with trying not to bother a sick friend. In this dissertation, I describe research on how people share health information within their existing social networks and present technology to create informed, helpful networks. I used a mixed methods approach of interviews and an online questionnaire to provide a detailed analysis of what health information people share, who they share with, mode of transmission, and why people share personal health information.
My research culminates in the design of new technology that enables patients to create an informed network and catalyzes helping activities within that network. I used participatory design methods with breast cancer patients and survivors to ensure that the design is based on a firm understanding of users’ goals, priorities, constraints, and current sharing practices. Together, we designed a technology that allows a patient to keep their social network up to date, solicit help from their network, field offers of help, and collaborate through discussions. The design is motivated by the insight that a more informed social network is better able to provide needed help and support. Advocating that patient-centered technology should allow users to share personal health information with others comes with the responsibility to contribute to the effort to create usable privacy interfaces. I present a method for evaluating the transparency of privacy controls and use this method to identify a transparent icon that can be embedded within interfaces to show how information is being shared.
Embracing the complex picture of how patients manage and share personal health information with others will ultimately improve the technology available to support patients. I contribute a better understanding of current sharing practices and technology to enable patients to create informed, helpful social networks.
Committee: Wanda M. Pratt (Chair), George Demiris, Beverly Harrison, James S. Fogarty (GSR)
Andrea Hartzler (Civan)
Understanding and Facilitating Patient Expertise Sharing
A fundamental part of becoming an empowered patient is learning to engage in the day-to-day management of personal health. Yet learning to manage personal health can take substantial time and effort when patients do so through trial and error on their own. Although health informatics support has the potential to help patients overcome this challenge by facilitating patient expertise sharing, we lack the knowledge necessary to meet this potential. Prior work provides little clarity about the nature of patients' personal health expertise and has not explored the practices patients use to leverage this experiential knowledge offered by other patients in similar situations. This dissertation contributes foundational knowledge about what patient expertise is and how patients share this valuable resource. Within the context of breast cancer, I (1) describe the characteristics of patient expertise through a comparative content analysis that demonstrates how this unique form of knowledge significantly differs from the expertise obtained from health professionals in topic, form, and style, (2) describe practices patients use to share their expertise in their everyday lives during cancer treatment through a naturalistic field study, and (3) employ a user-centered approach, informed by specific design recommendations I propose for enhancing health-related social software, to design a patient expertise locator to facilitate patient expertise sharing. This work provides substantial guidance on new ways to think about the design of supportive tools for patients. Patients need help from peers and this work provides the understanding and guidance necessary to empower patients by facilitating patient expertise sharing.
Last Known Position: Associate Professor, Biomedical Informatics and Medical Education, University of Washington
Committee: Wanda M. Pratt (Chair), John H. Gennari, William P. Jones, David W. McDonald, Huong Q. Nguyen (GSR)
Automated Learning of Protein Involvement in Pathogenesis Using Integrated Queries
Methods of weakening and attenuating pathogens' abilities to infect and propagate in a host, and thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories. A lightweight method for data integration is introduced, which links information regarding a protein via a path-based query graph and supports both exploratory and logical queries; data gathered in this way is characterized with experiments on retrieving high-quality annotation data. A system and method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are leveraged against the problem of generalized and specific virulence function prediction. This approach improves coverage of functional data over a protein, outperforms other recent approaches to identification of virulence factors, is robust to different weighting schemes of varying complexity and is found to generalize well to traditional function prediction.
Last Known Position: Senior Data Science Manager, Microsoft
Committee: Peter J. Myler (Chair), Ira J. Kalet, William S. Noble, Peter Tarczy-Hornoch, Evan E. Eichler (GSR)
Automated Analysis and Surveillance for STI / HIV Online Behavior
An automated algorithm has been developed to extract health related information from free text. This interactive expert system is one of the first to automatically extract characteristics related to STI/HIV prevention from online sex-seeking discourse. In addition to the design of a novel fully supervised expert system, this research is focused on the contextual evaluation of a new type of information for STI/HIV prevention activities.
Last Known Position: Associate Professor, Virginia Commonwealth University
Committee: Fredric M. Wolf (Chair), Neil F. Abernethy, Mark W. Oberle, Anne M. Turner, Jacob O. Wobbrock (GSR)
Terry Hsin-Yi Shen
Determining the Feasibility and Value of Federated Data Integration Combining Logical and Probabilistic Inference for SNP Annotation
Most common and complex diseases are influenced at some level by variation in the genome. The future work of statistical geneticists, molecular biologists, and physician-scientists with interests in genetics or genomics must thus take genetics into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. Functional uncertainty, volume of information, and cost-effectiveness result in the prioritization of SNPs to be an important research question. SNP Integration Tool (SNPit) is a data integration system tool that looks at all the possible predictors of functional SNPs and provides the user with integrated information and decision making capability. Determining the feasibility and value of SNPit with rules and probabilistic inference, thus, represents challenges from both the biological and biomedical informatics standpoint concerning how to represent, integrate, and conduct inference over disparate biological data sources.
The main objective of this dissertation is to determine the feasibility and value of creating a federated integration system with combinations of logical, probabilistic, and logical combined with probabilistic inference for functional SNP annotation. Through iterative design, four versions of the SNPit system were created which consolidates information on a variety of functional annotation predictors and includes combinations of logical and probabilistic inference. Furthermore, this dissertation evaluates the feasibility of federated data integration and assesses its’ accuracy for SNP annotation, characterizing the suitability for adding logical and probabilistic inference to the federated data integration for both point and regional SNP annotation. This study also explores the feasibility of combining both logical and probabilistic inference for point and regional SNP annotation. This dissertation contributes to general knowledge in informatics as well as SNP annotation by describing the design, implementation, and evaluation of combinations of logical, probabilistic, and both logical and probabilistic inference applied to the domain of functional SNP annotation.
Last Known Position: Faculty Research Associate, University of Maryland School of Nursing
Committee: Peter Tarczy-Hornoch (Chair), Melissa A. Austin, James F. Brinkley III, Chris Carlson, Kelly Fryer-Edwards (GSR)
Information and Communication Technologies in Patient-Centered Healthcare Redesign: Qualitative Studies of Provider Experience
Promoting widespread availability and provider adoption of electronic medical records is a core component of current efforts to reform healthcare in the United States. Initiatives to redesign healthcare to achieve quality improvement, patient access, economic sustainability, and other reforms often seek to leverage the potential of electronic medical records and other information and communication technologies. However, the evidence pertaining to the effectiveness of these technologies in supporting and promoting these objectives is limited, and their adoption among healthcare providers remains low – particularly in primary care and other ambulatory care settings. Given both the questionable sustainability of primary care and its central role in current healthcare reform initiatives, there is a critical need to inform these endeavors with empirically - derived knowledge of how information and communication technologies affect healthcare providers and their efforts to redesign care to better meet the needs of their patients and communities. This dissertation explores provider perspectives on the roles, importance, and effects (both positive and negative) of healthcare information and communication technologies in the context of patient-centered healthcare redesign. Three qualitative observational studies were conducted at Group Health Cooperative, a large integrated healthcare delivery system serving patients throughout the Pacific Northwest. These studies were informed by Donabedian’s framework for evaluating healthcare quality, Rogers’ Diffusion of Innovations Theory, and the Tavistock Institute’s Sociotechnical Systems Theory.
Findings revealed provider and organizational perspectives on their experiences with implementing and using a commercial clinical information system (EpicCare Ambulatory EMR) with an integrated patient Web portal, patient-provider email, internal clinical messaging, an internally-developed online health risk assessment application, and other information and communication technologies. Participants expressed sharply contrasting perspectives on the same technologies viewed as components of two unique practice redesign initiatives – an organization-wide redesign of operations to implement Patient-Centered Access, and a single clinic redesign to implement the Patient-Centered Medical Home model. These findings suggested that contextual factors such as the care redesign methods and the care models used to guide care redesign are key determinants of the effects associated with the implementation and use of these technologies. This dissertation contributes to the literature on sociotechnical approaches to technology-enabled healthcare redesign and evaluation by describing how instances of these different care redesign models incorporated the various technologies, and by evaluating providers’ perspectives on their roles, importance, and effects.
Last Known Position: Director of Clinical Decision Support at Virginia Mason Medical Center
Committee: Peter Tarczy-Hornoch (Chair), Bryant T. Karras, James D. Ralston, Robert J. Reid, Karen E. Fisher (GSR)
An Evidential Knowledge Representation for Drug-mechanisms and its Application to Drug Safety
A major challenge to designers of informatics tools that help alert clinicians to potential drug-drug interactions (DDIs) is how to best assist clinicians when they must infer the potential risk of an adverse event between medication combinations that have not been studied together in a clinical trial. The central thesis of this dissertation is that DDI prediction using drug mechanism knowledge can help drug-interaction knowledge bases expand their coverage beyond what has been tested in clinical trials while avoiding prediction errors that occur when individual drug differences are not recognized. This dissertation describes a knowledge representation system, called the Drug Interaction Knowledge Base (DIKB), that uses a novel approach to linking and assessing evidence support for drug-mechanism assertions.
The DIKB is the first knowledge-representation system we are aware of to use a computable model of evidence and a Truth Maintenance System to manage assertions in its knowledge-base. The novel approach to evidence management implemented in the DIKB enables its prediction accuracy and coverage to be optimized to a particular body of evidence; a feature that is very desirable for clinical decision support. The DIKB is also novel for its computable representation of the conjectures behind a specific application of evidence. These evidence-use assumptions enable the system to flag when a conjecture has become invalid and alert knowledge-base maintainers to the need to reassess their original interpretation of what assertions a piece of evidence supports. They are also used as evidence is input into the system to help identify a pattern, called a circular line of evidence support, that is indicative of fallacious reasoning by evidence-base curators. The DIKB has been shown capable of accurately predicting clinically-relevant DDIs using only pharmacokinetic drug-mechanism knowledge and development of the system has helped to identify and evaluate potential informatics solutions to the challenges of representing, synthesizing, and maintaining drug mechanism knowledge.
Last Known Position: Associate Professor, Biomedical Informatics, University of Pittsburgh
Committee: Drs. Ira J. Kalet (Chair), Carol J. Collins, Thomas K. Hazlet, John Horn, Stuart A. Suutton (GSR)
Eunjung Sally Lee
Supporting Multi-institutional, Interdisciplinary Biomedical Collaboration (MIBC): A Biomedical Informatics Approach
The modern biomedical research community is facing ever more challenging research questions. Out of necessity, biomedical research has become increasingly interdisciplinary and large-scale in nature. Yet large-scale interdisciplinary biomedical collaborations are not easily established or maintained. Many funding agencies identify biomedical informatics as an important foundation to support biomedical collaboration to alleviate some of the challenges large-scale interdisciplinary collaborations face. However, biomedical informatics has yet to understand in detail how large-scale interdisciplinary biomedical collaborations operate and deal with day-to-day challenges associated with collaboration.
This research used contextual field study to describe the characteristics of large-scale interdisciplinary biomedical collaboration in-depth and to identify barriers, existing facilitators, and needs associated with various collaborative processes. The study result was synthesized to develop a context-specific informatics framework to support large-scale interdisciplinary biomedical collaboration that extends prior research of collaboration in other fields. In the future, the framework can be used as a guide for design and evaluation of collaborative infrastructure.
Last Known Position: Population Health Analyst, University of Washington
Committee: Peter Tarczy-Hornoch (Chair), David W. McDonald, Fredric M. Wolf, Barbara B. McGrath (GSR)
Modeling Uncertainty in Data Integration for Improving Protein Function Assignment
n this work we describe the development and evaluation of the BioMiner system for protein functional annotation. BioMiner is the implementation of a novel uncertainty model for annotation and is based on the Uncertainty in Information Integration (UII) system, a general-purpose data integration system with extended functionality to handle uncertainty in data. The informatics contributions of our work are as follows: 1) we develop and implement a first-in-class uncertainty model for annotation and illustrate the validity of the model, 2) we show that the uncertainty model is reliable by evaluating its robustness through a principled methodology, and 3) we demonstrate that the uncertainty model performs better than existing, commonly utilized, approaches through a rigorous performance evaluation.
The application of BioMiner also contributes to the expansion of domain knowledge by accurately identifying functions for proteins of unknown function, a problem of utmost importance to biology.
Last Known Position: Associate Director of Data Science, Auransa Inc.
Committee: Peter Tarczy-Hornoch (Chair), Eugene E. Kolker, Dan Suciu, John M. Miyamoto (GSR)
Evaluating Experimental Information Management in Biomedical Research: A Case-Study Approach
Data intensive biomedical research is increasingly integrative; knowledge gained from a spectrum of disciplines and tools is generated, collected and applied to aid in the analysis and description of biological phenomena. Though there are many evolving approaches and much effort to bring a synthesis of disciplines to biological research, we have little understanding of how researchers are coping with the longitudinal experimental data management challenges involved with day-to-day experimental work.
This research uses multiple theoretical frameworks and data collection methods to identify issues involved in the use of information-rich research tools and techniques by academic laboratory research groups. Experience from a broad evaluation of common information management issues affecting the local biomedical research community was used to inform a case study protocol for the study of information technology in laboratory settings. This protocol was then used to design a focused case study of microarray gene expression analysis (MGEA) information use and workflow in academic laboratories. MGEA is a methodology that due to its large generated raw data sets, expensive measurement equipment and complex analysis procedures requires collaboration with specialists in biostatistics and bioinformatics to aid researchers in effective inquiry. As such, the academic use of MGEA methodologies is a representative example of information management challenges that are necessary to provide integrative biomedical research support. This case study approach was then used to evaluate the utility and transferability of the protocol to other laboratory information management issues.
This work seeks to explore two fundamental issues: The first is the development of methods to capture the full complexity and cost of planning, collaboration and analysis needed to complete data-rich academic biomedical experiments. The second is to use these methods to explore the use of a representative technology, and to assess the degree to which an exploratory case-study approach can serve to inform bioinformatics design, implementation and support.
Last Known Position: Robert D. Cardiff Professor of Informatics, Director of Informatics Research at UC Davis
Committee: Peter Tarczy-Hornoch (Chair), Roger E. Bumgarner, Christopher Dubay, Karen Fisher, Jannelle S. Taylor (GSR)
Errors in the Clinical Laboratory: A Novel Approach to Autoverification
Clinical laboratories provide a critical service to the health care and well-being of the world's population. Estimates suggest that the clinical laboratory influences some 70 percent of health-care decisions, but requires only about 4 percent of the health-care expenditures. Given an estimated 7 billion laboratory tests per year in the United States, about 1% of the results, or 70 million laboratory errors annually, are erroneous with an estimated 6%, of those errors causing harm to the patient. Laboratory errors harm millions of patients each year and laboratory experts spend countless hours reviewing billions of laboratory results each year in the search for these rare errors. Autoverification systems, automated programs used to check laboratory results for errors, can save laboratories countless hours and be more accurate than laboratory experts, but the current generation of rule-based systems is not appropriate for the clinical laboratory domain due to its inherent uncertainty. This research demonstrates that a novel approach using a synthetic error generation system to create training datasets for a conditional Gaussian Bayesian network produces an autoverification system superior to ones trained using standard methods and superior to laboratory experts. Unlike standard approaches that require an expensive and time-consuming expert annotation process to create training datasets, the synthetic error generation method uses results that were reviewed normally.
By creating synthetic datasets, the synthetic error generation process creates customized training datasets, which maximize the Bayesian network's performance in detecting errors. In this dissertation, we review the clinical laboratory process and the many sources of errors in clinical laboratory results, Bayesian networks, and the class imbalance problem. Next, we elucidate the performance characteristics of the synthetic error generation process, which is followed by a comparison between our novel method and standard approaches to the class imbalance problem. Finally, we compare the results of a synthetic error autoverification system against laboratory experts in the identification of errors.
Last Known Position: Senior Research Scientist at The George Washington University
Committee: Peter Tarczy-Hornoch (Chair), Michael L. Astion, Jason N. Doctor, Kenneth M. Rice (GSR)
Hen-Tzy (Jill) Lin
A Shape-based Image Retrieval System for Assisting Intervention Planning
Craniosynostosis is a serious and common pediatric disease caused by the early fusion of the sutures of the skull. Premature suture fusion results in severe malformation in calvarial shapes. A single surgery, i.e. cranioplasty, is required to release the fused suture and reshape the deformed calvaria in order to prevent further deformation in skull shapes and impairment in neuropsychological development. Even though no concrete evidence suggests whether or not surgical complications and neurobehavioral developments are directly affected by different calvarial shapes, radiologists and surgeons often use cases of similar shapes that were previously resolved as guidelines to prepare for pre-surgical planning and post-surgical evaluation. With the increasing amount of imaging data, a systematic and quantitative approach is required to help physicians capture information embedded in images and deﬁne image similarities between cases. We have designed and implemented a shape-based image retrieval system that will objectively and quantitatively retrieve cases of similar shapes that were previous treated or established to help physicians in the decision making process of the reconstruction of the skull.
Currently, most imaging studies in patients with craniosynostosis emphasize the description of qualitative features and relegate quantitative assessments to the measurement of a ratio or an angle between anthropometric landmarks. In order to objectively detect inter- and intra-class differences between shapes in the image retrieval system, we have developed a novel shape measurement called the symbolic shape descriptor (SSD) to reﬁne and establish quantitative deﬁnitions of skull phenotype. Our experiments show that the SSD has classiﬁcation performance that is better than or comparable to other shape descriptors, uses less space, and is much faster than competitors. We have also conducted a regression analysis to determine the correlation between skull shapes and neuropsychological development in children with isolated sagittal synostosis. The result of this study is incorporated in the retrieval system for prediction of mental and psychomotor scores in order to help psychologists decide whether to initiate intervention on affected children.
Last Known Position: Director, Jazz Pharmaceuticals
Committee: Linda G. Shapiro (Chair), Efthimis N. Efthimiadis, James F. Brinkley III, Ira J. Kalet, Raymond W. Sze
Ontology Recapitulates Phylogeny: Design, Implementation and Potential for Usage of a Comparative Anatomy Information System
Building on our previous design work in the development of the Structural Difference Method (SDM) for symbolically modeling anatomical similarities and differences across species, we describe the design and implementation of the associated comparative anatomy information system (CAIS) knowledge base and query interface, and provide scenarios from the literature for its use by research scientists. Our work includes several relevant informatics contributions. The first one is the application of the structural difference method (SDM), a formalism for symbolically representing anatomical similarities and differences across species. We also present the design of the structure of a mapping between the anatomical models of two different species, and its application to information about specific structures in humans, mice, and rats. The design of the internal syntax and semantics of the query language underlies the development of a working system that allows users to submit queries about the similarities and differences between mouse, rat, and human anatomy; delivers result sets that describe those similarities and differences in symbolic terms; and serves as a prototype for the extension of the knowledge base to any number of species. We also contributed to the expansion of the domain knowledge by identifying medically-relevant structural questions for humans, mice, and rats. Finally, we carried out a preliminary validation of the application and its content by means of user questionnaires, software testing, and other feedback.
Last Known Position: Research Scientist, University of Washington
Committee: Linda G. Shapiro (Chair), Ira J. Kalet, Billie J. Swalla, John H. Gennari, Willie J. Swanson (GSR)
Supporting Collaborative Clinical Trial Protocol Writing through Annotation Design
Clinical trial protocols are important documents that guide clinical research. Modern protocol development requires collective expertise from a group of Loosely-Coupled protocol writers, who work across distances and time zones. Email has been the primary communication tool for these protocol writers. Unfortunately, it inadequately supports collaborative writing tasks. Without appropriate groupware technology, these protocol writers often compromise work efficiency and the degree of collaboration to complete their tasks. This situation is exhibited at the Southwest Oncology Group (SWOG), one of the Cooperative Group Programs under the direction of NCI. While it is clear that its current work practices do not support optimal collaboration, it is unclear how to improve the collaboration and communication in such group work because the complexities of collaborative protocol development has rarely been studied. This research utilizes and extends Computer-supported Cooperative Work (CSCW) theories to identify the problems in protocol development and to design groupware technology for supporting this group work.
This dissertation consists of four parts: (1) qualitative fieldwork of the collaborative protocol writing process at SWOG; (2) a design of an annotation model that facilitates in-context communication around evolving documents during the iterative reviewing and revising process; (3) a design and an implementation of a protocol collaborative authoring tool (PCAT) that embodies the annotation model from #2 to address group work problems identified in #1; and (4) a validation of the usability of the annotation model and the PCAT prototype. In addition, this dissertation implements a grounded design process and contributes a socio-technical design of groupware technology in a healthcare setting to the literature of socio-technical approaches for system design.
Last Known Position: Professor, Biomedical Informatics, Columbia University
Committee: John H. Gennari (Chair), Jonathon Grudin, Ira J. Kalet, David W. McDonald, David K. Farkas (GSR)
Kevin Lybarger, PhD
A. Fischer Lees, MD
Andrew Berry, PhD
Sarah Stansfield, PhD
Last Known Position: Postdoctoral Research Fellow, in Mathematical Modeling at the Fred Hutch, in their Vaccine and Infectious Disease Division
Houda Benlhabib, PhD
Noah Hammarlund, PhD
Last Known Position: Postdoctoral Scholar, UW School of Pharmacy
Leia Harper, PhD
Carolyn Paisie, PhD
RNAseq and Ribosome Profiling Generate New Insights into Leishmania Differentiation
Leishmania donovani, an intracellular parasitic trypanosomatid, causes kala-azar, a fatal form of visceral leishmaniasis in humans. Infection occurs through a cyclical cycle whereby parasites living in the midguts of female sandflies (promastigote stage) are transferred to the host via a bite from an infected female sandfly, are phagocytosed by human macrophages, and are then transferred to phagolysosomes of human macrophages (amastigote stage). Previous studies have demonstrated that L. donovani differentiation is regulated by changes in gene expression. Thus we have performed high throughput RNA sequencing (RNA-seq) to elucidate changes in transcript abundance for all cellular mRNAs during L. donovani differentiation from promastigotes into amastigotes. Analyses revealed gene expression changes which may affect posttranscriptional and translational processes during differentiation.
Last Known Position: Bioinformatics Analyst in Computational Science, The Jackson Laboratory for Genomic Medicine
Committee: Drs. Peter Myler (Chair), David Crosslin
Wayne Liang, MD
User-Centered Design of a Collaborative Genetic Variant Interpretation Tool
Precision genomic medicine relies upon accurate variant knowledge. However, laboratories continue to arrive at discordant interpretations for the same genomic test. Gaps, inconsistencies, and siloing of variant knowledge may contribute to inter-rater discordance in variant interpretation. Our overall goal is to develop a novel, openly available computerized tool supporting role-based collaboration, knowledge sharing, and consensus-making in variant interpretation. In Aim 1, we use literature review and informal expert input to characterize a typical variant interpretation workflow, propose a collaborative workflow, and develop an initial design for a computerized tool supporting collaborative variant interpretation. In Aim 2, we use user-centered design methodology to further characterize the typical workflow, define project requirements and user needs, and finalize the design of a tool supporting collaborative variant interpretation.
Last Known Position: Assistant Professor in Pediatrics, University of Alabama at Birmingham
Committee: Drs. Peter Tarczy-Hornoch (Chair), Annie Chen, David Crosslin, Leslie Kean
Lisa Taylor-Swanson, PhD
Last Known Position: Assistant Professor, College of Nursing, University of Utah
Last Known Position: Assistant Professor in Nursing, UW Tacoma
Last Known Position: Asst Professor in Human-Centered Computing, IUPUI
Last Known Position: Sr User Researcher, Microsoft Health, AI and Research; Affiliate Assistant Professor, UW Department of Biomedical Informatics and Medical Education
Last Known Position: Research Assistant Professor, University of North Carolina at Chapel Hill
Leslie (Dean) Poppe
Last Known Position: Clinical Psychologist, The Everett Clinic
Last Known Position: Executive Director Professional Development at Care New England
Last Known Position: Assistant Professor, Drexel University's College of Computing and Informatics
Last Known Position: Research Scientist, University of Washington
Last Known Position: Assistant Professor of Information, School of Information; Assistant Professor of Health Behavior and Health Education, School of Public Health, University of Michigan
Last Known Position: Co-Founder and Executive Director, ConstaTec Solutions, LLC
Last Known Position: NuGEN Technologies
Last Known Position: Resident Physician, The University of New Mexico Health Sciences Center
Last Known Position: Entrepreneur
Last Known Position: Assistant Professor, UW Bothell
Last Known Position: Hospitalist , Associate Medical Director for Clinical Informatics, Virginia Mason Medical Center
Last Known Position: Assistant Professor in Pediatric Rheumatology at the University Of Utah School Of Medicine
Last Known Position: Physician at Northwest Acute Care Specialists
Last Known Position: Research Scientist, University of Washington
Last Known Position: Professor at University of Hawaii
Last Known Position: Attending ARNP at Fairfax Behavioral Health
Last Known Position: Project ECHO Administrator at University of Washington
Last Known Position: Sr Director, Computational Biology at Fulcrum Therapeutics
Last Known Position: Chief Data Engineer for Public Health Innovation at MITRE
Last Known Position: Professor, Biobehavioral Nursing and Health Systems Joint Professor, BIME Joint Professor, Global Health Adjunct Professor, Health Services
Last Known Position: Associate Professor, and Director UH Data Analysis and Intelligent Systems Lab (UH-DAIS) Department of Computer Science, University of Houston
Last Known Position: Lecturer, Department of Biomedical Informatics and Medical Education, University of Washington
Last Known Position: Informatics Health Scientist, CDC
Taryn Hall, PhD
Last Known Position: Principal Research Scientist, UnitedHealth Group
Claire Jungyoun Han, PhD, MSN, RN, CCRN
Last Known Position: Postdoctoral Fellow, BCPT Cancer Fellowship (Biobehavioral Cancer Prevention and Control Training Program), Department of Health Services, University of Washington
Mike Hairfield, DDS
Shih-Yin Lin, PhD
Last Known Position: Senior Research Scientist/Project Director, New York University, College of Nursing
Erik Van Eaton
Last Known Position: Chief Clinical Officer at TransformativeMed, and Trauma Surgeon & Associate Professor at Harborview Medical Center
Clinical Informatics Fellows
Jeehoon Jang, MD
Zachary Liao, MD, MPH
Michelle Stoffel, MD, PhD
Ethan Tseng, MD, MBA
Arpit Patel, MD
Last Known Position: Regional CMIO, Dignity Health
Nikita Pozdeyev, MD, PhD
Last Known Position: Head of Translational Informatics Services, CPM, University of Colorado
Reza Sadeghian, MD, MBA, MSc
Last Known Position: CMIO, Arrowhead Medical Center
Tokunbo (Toks) Akande, MD, MPH, FAAP
Last Known Position: Medical Director of Informatics, Bemidji Region, Sanford Health
Xinran (Leo) Liu, MD
Last Known Position: Director of Clinical Informatics, St. Mary's Hospital; Assistant Clinical Professor; Clinical Advisor at Google; Associate Program Director, UCSF
Craig Monsen, MD