Skip to main content

News and Events


 

Chair’s Message

pth-use-this-oneWe are moving toward our vision with a number of activities across our various programs. We have updated our strategic plan in response to the 10-year academic program review that we recently completed. For our research-oriented MS and PhD programs, we have recently added a specialization in Data Science. We are completing a curriculum revision for our on line applied clinical informatics MS which will be effective Fall 2020. The work of our fellows in the clinical informatics fellowship program has received plaudits from clinical administrators and faculty, and we are currently recruiting a new faculty member in our department to assist with this program (view position description).  We are also recruiting a faculty member in medical education to start Summer 2020 (view position description). This is the beginning of a new cycle of admissions to our graduate programs, and we look forward to another productive year, and new growth in our department.

Cordially,

Peter Tarczy-Hornoch, MD
Chair and Professor, Department of Biomedical Informatics and Medical Education


The University of Washington Welcomes New Chief Research Information Officer 

The University of Washington is delighted to announce Dr. Shawn Murphy, MD, PhD will be joining the University of Washington as a core faculty in Biomedical Informatics and Medical Education (BIME) with a joint appointment in Neurology. In addition to being a core BIME faculty member and attending in an outpatient neurology clinic, he will be serving as UW Medicine Chief Research Information Officer in IT Services, informatics lead of the Institute for Translational Health Sciences (ITHS) Data Science Core, and the Director of the Institute for Medical Data Sciences (IMDS).

“I am honored and excited to join the Data Science workforce at the University of Washington.  I have always admired the closeness UW students have to the top teachers in the tech industry.  I am hoping to enable Clinical and Informatics Research to use this and take Data Science and AI to a new level at UW,” said Dr. Murphy.

Dr. Murphy is currently the Chief Research Information Officer at Mass General Brigham and a Professor of Neurology. Over the past 30 years, he has been an integral leader and innovator in supporting research data and technology there.  He is the creator of the Research Patient Data Registry (RPDR) and co-founded i2b2 (used in > 300 sites globally) enabling scalable, audit-ready clinical data access for the research community nationally and internationally.  Shawn has also been a leading scientific collaborator and principal investigator on numerous multi-institutional NIH grants, including RECOVER, eMerge, ACT, AoU, PCORI, SHRINE and other major award programs that helped elevate MGB into a leader in translational informatics. His work has shaped standards for privacy-preserving data sharing, cohort discovery tools, and clinical data harmonization, influencing research data operations across Mass General Brigham hospitals and enabling thousands of studies per year.

We are delighted to have Dr. Murphy join UW, bringing decades of expertise in informatics and data science research and operational work around instrumenting the health care enterprise for discovery in the course of clinical care,” said Dr. Peter Tarczy-Hornoch, Chair of BIME and Interim Director for IMDS.

“ITHS welcomes Dr. Murphy to this crucial role.   He is exactly the right person to take on the new informatics challenges of clinical and translational research,” said Dr. John Amory, Principal Investigator of ITHS and Associate Dean of Translational Sciences. 

Dr. Murphy will be bringing this rich and deep expertise to his UW roles as BIME faculty, Neurology faculty, UW Medicine CRIO, ITHS informatics lead and Institute for Medical Data Science Director.

“I am excited to welcome Dr. Murphy to UW Medicine and confident that his decades of leadership in research informatics at Mass General Brigham will significantly advance our research, data science, and innovation efforts,” said Eric Neil, UW Medicine Chief Information Officer.

Dr. Murphy’s position begins effective February 1, 2026.

 

Biomedical Informatics and Medical Education Newsletter

 

December 8, 2025 – December 12, 2025

UPCOMING LECTURES AND SEMINARS
BIME 590 – On break until January 8th!

PAPERS, PUBLICATIONS & PRESENTATIONS

  • Annie T. Chen is giving a presentation as part of a panel entitled, “Researching Health Information Behaviors: Landscape, AI’s Role and Its Impact” at the virtual annual satellite meeting of the Association for Information Science & Technology (ASIS&T 2025), 11, 2025.
  • Annie T. Chen is giving a presentation as part of a panel entitled, “Co-Creation in Context: Participatory Approaches to Digital Humanities and Cultural Heritage Work” at the virtual annual satellite meeting of the Association for Information Science & Technology (ASIS&T 2025), 12, 2025.
  • Choi, B., Wang, L. C., Kang, R. A., Ho, I. Y., Johnny, S., Chaliparambil, R., Chen, A. T.(accepted). What should I do?: Information and support needs relating to substance use on Reddit. Substance Use & Misuse.
  • Meghan Kiefer, Edith Wang, Kellie Engle, Maya Sardesai, Heather McPhillips, Matthew Cunningham. Improving USMLE Step 1 Outcomes in Academically Vulnerable Students Through a Targeted, Competency-Based Curriculum. Accepted by BMC Medical Education.

 

UPCOMING EXAMS
Final Exam

Title
: Comprehensive assessment and quantification of incoherent speech using natural language processing
Student: George (Weizhe) Xu
Date/Time: Friday, December 12, 2025, 10:00 AM – 12:00 PM PT
In-person location: 850 Republican Street, Building C, SLU C259
Zoom: https://washington.zoom.us/my/cohenta


Abstract
: Coherence is a linguistic feature that is defined as the orderly and interconnected flow of ideas. The disruption of coherence is a linguistic anomaly that is commonly observed in a group of psychiatric disorders known as schizophrenia spectrum disorders (SSD), where disorganized thoughts manifest as incoherent speech. While early detection of symptoms can potentially lead to better outcomes, manual assessment of symptom severity can be time-consuming and require specialized expertise. Therefore, symptom evaluation through automated coherence assessment methods is desired.

However, gaps remain in prior research on this area, namely, 1) most prior work focuses on the estimation of local coherence (coherent transitions between adjacent semantic units) via computation of cosine values between vector representations of sequential semantic units. The estimation of global coherence (sustaining a theme or topic throughout a narrative) has received much less attention; 2) the impact of automated speech recognition (ASR) errors receives little attention. Prior work mainly focused on using manual transcript data; 3) there is limited exploration on using language model perplexity to assess coherence, especially given the recent advancement of large language models (LLM).

This work bridges the gaps through the following contributions: 1) Two new global coherence assessment methods were developed based on centroids of embeddings (vector representation of semantic space). We found that the global coherence methods align better with human judgment than local coherence methods. 2) A time-series feature extraction pipeline is used to replace the aggregation step in coherence assessment pipelines. We found that by using this method, coherence evaluation process is resistant to the impact of ASR errors in the text input. 3) Two sentence-level perplexity-based coherence methods were developed, and we revealed that combining perplexity features with traditional coherence scores (proximity features because they are based on cosine similarity) resulting in better prediction models than using proximity or perplexity features alone. 4) The innovations and classical approaches were combined into the Comprehensive Coherence Calculator (CCC), a software package that can perform comprehensive coherence analysis with a myriad of configurations. With these contributions, a fully automated coherence assessment pipeline can be established to offer patients easy monitoring at home, clinicians the necessary information to provide better care, and researchers an objective quantitative basis for the study of semantic coherence.

ANNOUNCEMENTS
Please join us in congratulating Namu Park who successfully passed his PhD Defense!
Title
: Leveraging Large Language Models for Clinical Information Extraction in Radiology Reports

Abstract: Medical imaging plays a central role in diagnosing, monitoring, and managing a wide spectrum of diseases, including cancer, cardiovascular disorders, neurological conditions, and musculoskeletal abnormalities. Radiologists interpret complex imaging data and summarize their findings in narrative reports, which remain largely unstructured. The rapid expansion of imaging utilization has led to an overwhelming volume of such reports, posing significant challenges for clinical decision support. Their unstructured format limits automated analysis, secondary use, and integration into downstream clinical workflows. This dissertation addresses two major barriers to the effective use of radiology reports in data-driven clinical systems: (1) the absence of publicly available, large-scale annotated corpora of radiology reports with detailed clinical findings suitable for training supervised models, and (2) the limited application of machine learning approaches, particularly large language models (LLMs), to real-world clinical tasks at scale. To overcome these challenges, the research is organized around three core aims: (1) developing a corpus of radiology reports annotated with detailed clinical findings and designing an advanced information extraction framework optimized for radiologic text; (2) evaluating the performance of diverse machine learning approaches, with emphasis on LLMs, for the practical task of identifying follow-up imaging recommendations; and (3) constructing a large-scale repository of incidental findings (incidentalomas) derived from the model outputs and proposing an NLP-based framework for automated incidentaloma detection to enhance clinical decision-making. Collectively, this work contributes a high-quality annotated dataset for radiologic text analysis and demonstrates the feasibility and utility of LLM-based approaches for transforming unstructured radiology reports into structured clinical intelligence, advancing the integration of medical imaging data into precision healthcare.

_________________________
Please join us in congratulating Yile Chen who successfully passed her PhD Defense!

Title
: Advancing Variant Interpretation: A Gene-Specific Framework for Prioritization, Prior Estimation, and Calibration to Enhance Evidence Strength and Clinical Significance Classification

Abstract: This dissertation develops gene-specific informatics frameworks to improve functional assay prioritization, pathogenicity prior estimation, and the calibration of Variant Effect Predictors (VEPs)—all aimed at reducing the high burden of Variants of Uncertain Significance (VUS) in genomic medicine. By integrating statistical modeling, positive–unlabeled learning, domain-aware clustering, and dynamic calibration strategies, the work advances the ACMG/AMP Bayesian framework toward more accurate, context-aware variant interpretation.

First, a gene prioritization framework was created to identify genes where functional assays would have the greatest clinical impact. Using Multiple Score Optimization, the method simultaneously considers VUS “movability,” potential correction of misclassified variants, and improvements achievable through computational predictors. This approach highlights high-value genes such as TSC2, providing a principled strategy for directing experimental resources.

Second, the dissertation replaces the assumption of a universal disease prior with gene-specific pathogenicity priors estimated through a refined positive–unlabeled (PU) learning method (DistCurve). These priors vary substantially across genes (roughly 1–30%). For genes lacking sufficient labeled variants, a complementary domain-based clustering method enables prior estimation at the protein-domain level, extending applicability to thousands of genes.

Third, building on these priors, a gene-aware VEP calibration framework was developed to convert raw predictor scores into calibrated PP3/BP4 evidence strengths. Because no single method performs optimally across all genes, a dynamic decision-tree workflow was designed to automatically select the best calibration strategy based on a gene’s data characteristics. This gene-specific approach outperforms global calibration methods and reduces miscalibrated evidence assignments. Additionally, a mixed-predictor strategy—choosing the best VEP per gene—further improves variant classification accuracy.

Together, these contributions establish a context-aware decision-support ecosystem that better directs functional assay investment, provides robust statistical foundations for Bayesian interpretation, and improves the reliability of computational evidence. The resulting framework enhances the accuracy, consistency, and clinical actionability of genomic variant classification.

_________________________
Please join us in congratulating Ojas Ramwala who successfully passed his PhD Defense!
Title
: Improving the Clinical Translation of Deep Learning Algorithms for Mammography-based Breast Cancer Screening
 

Abstract: Despite the reduction in breast cancer mortality, breast cancer remains the second leading cause of cancer-related deaths among US women. Multiple randomized controlled trials have demonstrated that screening mammography decreases breast cancer mortality by aiding the early detection of breast cancers. However, the interpretive accuracy of breast cancer screening is limited by human visual perception of mammographic abnormalities. Moreover, manual interpretation of mammograms is time-consuming and prone to inaccuracies. The variability in mammography interpretations among radiologists leads to multiple recalls, significantly increasing costs and patient stress. This has motivated the use of artificial intelligence (AI) techniques to automate the interpretation of screening mammograms. Mammography-based breast cancer screening has been a pioneering example in the development of deep learning algorithms for biomedical image processing and interpretation. However, their adoption in radiology workflows has been limited due to a lack of computational tools to validate their generalizability, methods to interpret their ‘black box’ inference mechanisms, and techniques to apply robust algorithms for challenging clinical outcomes.

Given the encouraging performance of deep learning algorithms in detecting useful imaging features and relationships not seen by breast imaging radiologists, validating their performance in target populations before integrating them into clinical workflows becomes crucial. Concerns associated with model generalizability and biases necessitate rigorous external validation of AI algorithms. We developed strategic guidelines and a computational framework, ClinValAI, for establishing robust cloud-based infrastructures to perform clinical validation of AI algorithms in medical imaging. Our framework supports health institutions in understanding the generalizability and investigating the latent biases of deep learning algorithms, thereby equipping them with an evidence-based approach to adopt the most suitable AI tool to optimize radiology workflows and improve patient outcomes.

In addition to rigorous external validation, providing clinically meaningful explainability is imperative to earning radiologists’ trust in the predictions of robust and generalizable ‘black-box’ deep learning algorithms for mammography interpretation. We advance the paradigm of developing explainability techniques faithful to the model architecture of the AI algorithm. Specifically, we explain the inference mechanism of Mirai, the state-of-the-art deep learning algorithm for mammography-based breast cancer risk prediction. By providing quantitative and qualitative explainability, we highlight the range of strategies Mirai uses to estimate breast cancer risk. Our technique and results have the potential to optimize the clinical utility of AI-based risk assessment tools and can broaden our understanding of breast cancer risk factors.

The application of deep learning techniques for mammography-based breast cancer risk assessment has primarily focused on overall risk prediction. However, the risk of advanced breast cancer is a more accurate predictor of breast cancer mortality than overall risk, since overall risk can be confounded by indolent tumors that would not affect survival if left undiagnosed. To efficiently use the limited advanced breast cancer risk data, we applied recent advances in self-supervised pretraining to develop a digital mammography foundation model and adapted it to predict the future risk of advanced breast cancers. Our work may assist breast imaging radiologists in identifying women at higher risk of advanced breast cancer, to potentially implement timely interventions that can improve breast cancer early detection and survival outcomes.

Overall, we address the key factors enabling the clinical translation of mammography-based deep learning algorithms for breast cancer screening.

_________________________
Please join us in congratulating Wenyu Zeng in passing her General Exam!
Title
: Understanding Glioblastoma Heterogeneity through Spatial Transcriptomics

Abstract: Glioblastoma (GBM) is the most aggressive primary brain tumor in adults, characterized by profound cellular and spatial heterogeneity that drives therapeutic resistance and recurrence. Despite major advances in genomic and single-cell technologies, the spatial organization of tumor and microenvironmental cells, how malignant, immune, and vascular populations interact across distinct tumor regions, remains poorly understood. Addressing this gap is essential for uncovering the molecular programs and signaling interactions that define aggressive GBM subclones and for identifying new therapeutic targets. This project will develop and apply a computational framework for large-scale spatial transcriptomic analysis of human GBM using NanoString’s CosMx Spatial Molecular Imager (SMI) platform. The dataset comprises 12 CosMx slides representing multiple anatomical regions from several GBM patients, totaling over 550,000 cells and 6,175 genes. Analyses will be performed using a mixture of Python and R packages, which enables efficient processing of large spatial datasets while maintaining reproducibility and scalability.


December 1, 2025 – December 5, 2025

UPCOMING LECTURES AND SEMINARS
BIME 590 – On break until January 8th!

 

PAPERS, PUBLICATIONS & PRESENTATIONS

  • Tianmai M Zhang, Sydney P Sharp, John D Scott, Douglas Taren, Jane C Samaniego, Elizabeth R Unger, Jeanne Bertolli, Jin-Mann S Lin, Christian B Ramers, Job G Godino. Characterization of Post-Viral Infection Behaviors among Long COVID Patients: Prospective, Observational, Longitudinal Cohort Analyses of Fitbit Data and Patient Reported Outcomes. Accepted by JMIR Formative Research.

ANNOUNCEMENTS
Please join us in congratulating Sihang Zeng in passing his General Exam!
Title: Towards Trustworthy Modeling of Patient Trajectory with Longitudinal Electronic Health Records

Abstract: Patient trajectory modeling, which predicts future clinical events using data from longitudinal electronic health records (EHRs), is expected to be of value for personalizing disease management. Yet the adoption of powerful deep learning models is often hindered by their “black-box” nature, creating a barrier to clinical trust. This challenge is compounded when modeling complex temporal dependencies between lab test results, treatments, and clinical events in the EHR. This dissertation proposal develops novel interpretable frameworks for patient trajectory modeling, motivated by the hypothesis that models that account for the full dynamics of a patient’s history will produce more reliable predictions than simpler models, and that these predictions can also be made transparent or interpretable. This work is structured around three aims that collectively affirm this hypothesis while innovating in terms of both deep learning methods and interpretable learning tools. Aim 1: To develop an interpretable deep learning model for predicting survival in metastatic prostate cancer from pre-metastasis serial PSA values and treatments. Aim 2: To advance from discrete-time modeling to a more precise continuous-time framework by developing a model that learns continuous latent trajectories and uses a divide-and-conquer interpretation to explain how clinical changes drive outcomes. Aim 3: To create a more generalizable framework by developing a multi-agent system that leverages large language models (LLMs) to reason over long and noisy EHR data for lung cancer risk prediction. Through these complementary aims, this research seeks to contribute to the development of more trustworthy AI tools that can support personalized clinical decision-making.

 

Please join us in congratulating Xinyang Ren who successfully passed her PhD Defense!
Title
: Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk

Abstract: Depression is one of the most common mental disorders globally, and can carry an increased risk of adverse outcomes, including suicide. Suicide is one of the leading causes of death worldwide, and many more individuals attempt it or experience suicidal thoughts. Compounding these severe public health problems is a longstanding shortage of mental health professionals. There are too many patients for available professionals to monitor effectively, presenting opportunities for the use of technology to expand their capacity. Natural language processing (NLP) methods have been widely applied to psychologically related text analysis tasks to draw relationships between text and the thoughts and feelings of the person who generated it, as indicators of their mental status. In this work, I investigated how language models can be harnessed to automatically detect depression symptom severity and suicide risk. Several challenges and limitations remain in this field. There is limited research involving clinical populations that utilize contextual embeddings from state-of-the-art language models to detect linguistic indicators of depression and suicide risk. Moreover, certain patient-generated data sources that can reveal mental status, notably text-based therapy, Google search logs, and YouTube activities, remain underexplored. Existing research has primarily concentrated on electronic health record (EHR) data and social media posts, which are subject to certain limitations. Furthermore, despite the rapid development of large language models, their clinical application remains challenging due to high computational costs and ethical concerns. To fill these gaps, I have developed a series of research. Specifically, I have analyzed the use of contextual embeddings of first-person singular pronouns as predictors of depression symptom severity. To explore the use of individualized web searches for suicide risk assessment, I have evaluated the effectiveness of anomaly detection methods in identifying search pattern changes that precede a suicide attempt using personal Google search data. The proposed framework for semantic feature construction provides a computationally efficient, tractable approach that can be applied to web search logs at scale. The methods were further applied to study participants’ YouTube activity data, which were combined with Google search logs to enhance anomaly detection performance. This work demonstrates the potential of effectively using language models for automatic prediction of depression symptom severity and detection of suicide risk using real-world datasets.

 

Please join us in congratulating Bhargav Vemuri who successfully passed his PhD Defense!
Title
: Deep clustering to identify subgroups of multivariate trajectories in longitudinal biomedical datasets

Abstract: Unsupervised patient subgrouping in longitudinal biomedical datasets enables the discovery of distinct temporal phenotypes that capture heterogeneity in disease progression, treatment response dynamics, developmental trajectories, and more. Multivariate time series (MVTS) deep clustering methods are well-suited to this task because they (1) jointly model multiple longitudinal variables and (2) integrate missing data imputation, representation learning, and clustering into a unified framework. Recent state-of-the-art MVTS deep clustering approaches include Variational Deep Embedding with Recurrence (VaDER; de Jong et al., 2019) and Clustering Representation Learning on Incomplete time-series data (CRLI; Ma et al., 2021). In this work, we apply CRLI in two real-world longitudinal biomedical contexts and evaluate its performance against VaDER using 20 synthetic MVTS datasets of our own design.

In Aim 1, we explored CRLI’s capacity to detect multivariate trajectories in the electronic health record (EHR). Temporal EHR data is marred by irregular measurement intervals, high missingness, and multiple biases (selection, measurement, time-related). We assessed how well CRLI handles these hurdles in the context of identifying GLP-1 medication (semaglutide, dulaglutide, etc.) treatment response subgroups in the NIH All of Us Research Study.

In Aim 2, we applied CRLI to another real-word data source, the Adolescent Brain Cognitive Development (ABCD) Study, a longitudinal observational cohort with a prespecified assessment protocol, including a consistent follow-up schedule and a high retention rate (98.9%). This dataset allowed us to explore physical health trajectories (pubertal hormones, anthropometrics) as we did in Aim 1, but also mental health trajectories, as measured by 8 Child Behavior Checklist (CBCL) syndrome scales. We calculated cluster associations with mental health outcomes to better characterize cluster differences.

In Aim 3, we designed a framework using the mockseries Python package that let us rapidly generate unique MVTS datasets by sampling from a range of values for various datasets characteristics (time series length, noise, missingness, number of clusters, number of samples). We also incorporated the ability to modify time series variable properties (trend, rate of change, seasonality) by designing 5 distinct variable styles inspired by biomedical trends we observed in Aims 1 and 2 and the literature. We reported VaDER and CRLI performance on 4 external clustering validation indices (purity, RI, ARI, NMI) across 20 synthetic datasets.

 

UPCOMING EXAMS
Final Exam

Title
Leveraging Large Language Models for Clinical Information Extraction in Radiology Reports
Student: Namu Park
Date/Time: Monday, December 8, 2025, 10:30 AM PT
In-person location: Zoom Only
Zoomhttps://washington.zoom.us/my/melihay

Abstract: Medical imaging plays a central role in diagnosing, monitoring, and managing a wide spectrum of diseases, including cancer, cardiovascular disorders, neurological conditions, and musculoskeletal abnormalities. Radiologists interpret complex imaging data and summarize their findings in narrative reports, which remain largely unstructured. The rapid expansion of imaging utilization has led to an overwhelming volume of such reports, posing significant challenges for clinical decision support. Their unstructured format limits automated analysis, secondary use, and integration into downstream clinical workflows. This dissertation addresses two major barriers to the effective use of radiology reports in data-driven clinical systems: (1) the absence of publicly available, large-scale annotated corpora of radiology reports with detailed clinical findings suitable for training supervised models, and (2) the limited application of machine learning approaches, particularly large language models (LLMs), to real-world clinical tasks at scale. To overcome these challenges, the research is organized around three core aims: (1) developing a corpus of radiology reports annotated with detailed clinical findings and designing an advanced information extraction framework optimized for radiologic text; (2) evaluating the performance of diverse machine learning approaches, with emphasis on LLMs, for the practical task of identifying follow-up imaging recommendations; and (3) constructing a large-scale repository of incidental findings (incidentalomas) derived from the model outputs and proposing an NLP-based framework for automated incidentaloma detection to enhance clinical decision-making. Collectively, this work contributes a high-quality annotated dataset for radiologic text analysis and demonstrates the feasibility and utility of LLM-based approaches for transforming unstructured radiology reports into structured clinical intelligence, advancing the integration of medical imaging data into precision healthcare.

 

Final Exam
Title
Advancing Variant Interpretation: A Gene-Specific Framework for Prioritization, Prior Estimation, and Calibration to Enhance Evidence Strength and Clinical Significance Classification
Student: Yile Chen
Date/Time: Tuesday, December 9, 2025, 1 – 3 PM PT
In-person location: Sciences S 060 Genome Sciences, William H. Foege Hall, 3720 15th Ave NE, Seattle, WA 98195
Zoomhttps://depts.washington.edu/gsrestrc/remote.htm

Abstract: This dissertation develops gene-specific informatics frameworks to improve functional assay prioritization, pathogenicity prior estimation, and the calibration of Variant Effect Predictors (VEPs)—all aimed at reducing the high burden of Variants of Uncertain Significance (VUS) in genomic medicine. By integrating statistical modeling, positive–unlabeled learning, domain-aware clustering, and dynamic calibration strategies, the work advances the ACMG/AMP Bayesian framework toward more accurate, context-aware variant interpretation.

First, a gene prioritization framework was created to identify genes where functional assays would have the greatest clinical impact. Using Multiple Score Optimization, the method simultaneously considers VUS “movability,” potential correction of misclassified variants, and improvements achievable through computational predictors. This approach highlights high-value genes such as TSC2, providing a principled strategy for directing experimental resources.

Second, the dissertation replaces the assumption of a universal disease prior with gene-specific pathogenicity priors estimated through a refined positive–unlabeled (PU) learning method (DistCurve). These priors vary substantially across genes (roughly 1–30%). For genes lacking sufficient labeled variants, a complementary domain-based clustering method enables prior estimation at the protein-domain level, extending applicability to thousands of genes.

Third, building on these priors, a gene-aware VEP calibration framework was developed to convert raw predictor scores into calibrated PP3/BP4 evidence strengths. Because no single method performs optimally across all genes, a dynamic decision-tree workflow was designed to automatically select the best calibration strategy based on a gene’s data characteristics. This gene-specific approach outperforms global calibration methods and reduces miscalibrated evidence assignments. Additionally, a mixed-predictor strategy—choosing the best VEP per gene—further improves variant classification accuracy.

Together, these contributions establish a context-aware decision-support ecosystem that better directs functional assay investment, provides robust statistical foundations for Bayesian interpretation, and improves the reliability of computational evidence. The resulting framework enhances the accuracy, consistency, and clinical actionability of genomic variant classification.

 

Final Exam
Title
: Improving the Clinical Translation of Deep Learning Algorithms for Mammography-based Breast Cancer Screening
Student: Ojas A. Ramwala
Date/Time: Wednesday, December 10, 2025, 11:00 AM – 1:00 PM PT
In-person location: 850 Republican Street, Building C, SLU C359 (Light refreshments may be served.)
Zoom: https://washington.zoom.us/my/jhgennari?pwd=TUx0clkwKzdnS1ZQV1dXRnZqMWMzZz09


Abstract
: Despite the reduction in breast cancer mortality, breast cancer remains the second leading cause of cancer-related deaths among US women. Multiple randomized controlled trials have demonstrated that screening mammography decreases breast cancer mortality by aiding the early detection of breast cancers. However, the interpretive accuracy of breast cancer screening is limited by human visual perception of mammographic abnormalities. Moreover, manual interpretation of mammograms is time-consuming and prone to inaccuracies. The variability in mammography interpretations among radiologists leads to multiple recalls, significantly increasing costs and patient stress. This has motivated the use of artificial intelligence (AI) techniques to automate the interpretation of screening mammograms. Mammography-based breast cancer screening has been a pioneering example in the development of deep learning algorithms for biomedical image processing and interpretation. However, their adoption in radiology workflows has been limited due to a lack of computational tools to validate their generalizability, methods to interpret their ‘black box’ inference mechanisms, and techniques to apply robust algorithms for challenging clinical outcomes.

Given the encouraging performance of deep learning algorithms in detecting useful imaging features and relationships not seen by breast imaging radiologists, validating their performance in target populations before integrating them into clinical workflows becomes crucial. Concerns associated with model generalizability and biases necessitate rigorous external validation of AI algorithms. We developed strategic guidelines and a computational framework, ClinValAI, for establishing robust cloud-based infrastructures to perform clinical validation of AI algorithms in medical imaging. Our framework supports health institutions in understanding the generalizability and investigating the latent biases of deep learning algorithms, thereby equipping them with an evidence-based approach to adopt the most suitable AI tool to optimize radiology workflows and improve patient outcomes.

In addition to rigorous external validation, providing clinically meaningful explainability is imperative to earning radiologists’ trust in the predictions of robust and generalizable ‘black-box’ deep learning algorithms for mammography interpretation. We advance the paradigm of developing explainability techniques faithful to the model architecture of the AI algorithm. Specifically, we explain the inference mechanism of Mirai, the state-of-the-art deep learning algorithm for mammography-based breast cancer risk prediction. By providing quantitative and qualitative explainability, we highlight the range of strategies Mirai uses to estimate breast cancer risk. Our technique and results have the potential to optimize the clinical utility of AI-based risk assessment tools and can broaden our understanding of breast cancer risk factors.

The application of deep learning techniques for mammography-based breast cancer risk assessment has primarily focused on overall risk prediction. However, the risk of advanced breast cancer is a more accurate predictor of breast cancer mortality than overall risk, since overall risk can be confounded by indolent tumors that would not affect survival if left undiagnosed. To efficiently use the limited advanced breast cancer risk data, we applied recent advances in self-supervised pretraining to develop a digital mammography foundation model and adapted it to predict the future risk of advanced breast cancers. Our work may assist breast imaging radiologists in identifying women at higher risk of advanced breast cancer, to potentially implement timely interventions that can improve breast cancer early detection and survival outcomes.

Overall, we address the key factors enabling the clinical translation of mammography-based deep learning algorithms for breast cancer screening.

Final Exam
Title
: Comprehensive assessment and quantification of incoherent speech using natural language processing
Student: George (Weizhe) Xu
Date/Time: Friday, December 12, 2025, 10:00 AM – 12:00 PM PT
In-person location: 850 Republican Street, Building C, SLU C259
Zoom: https://washington.zoom.us/my/cohenta


Abstract
: Coherence is a linguistic feature that is defined as the orderly and interconnected flow of ideas. The disruption of coherence is a linguistic anomaly that is commonly observed in a group of psychiatric disorders known as schizophrenia spectrum disorders (SSD), where disorganized thoughts manifest as incoherent speech. While early detection of symptoms can potentially lead to better outcomes, manual assessment of symptom severity can be time-consuming and require specialized expertise. Therefore, symptom evaluation through automated coherence assessment methods is desired.

However, gaps remain in prior research on this area, namely, 1) most prior work focuses on the estimation of local coherence (coherent transitions between adjacent semantic units) via computation of cosine values between vector representations of sequential semantic units. The estimation of global coherence (sustaining a theme or topic throughout a narrative) has received much less attention; 2) the impact of automated speech recognition (ASR) errors receives little attention. Prior work mainly focused on using manual transcript data; 3) there is limited exploration on using language model perplexity to assess coherence, especially given the recent advancement of large language models (LLM).

This work bridges the gaps through the following contributions: 1) Two new global coherence assessment methods were developed based on centroids of embeddings (vector representation of semantic space). We found that the global coherence methods align better with human judgment than local coherence methods. 2) A time-series feature extraction pipeline is used to replace the aggregation step in coherence assessment pipelines. We found that by using this method, coherence evaluation process is resistant to the impact of ASR errors in the text input. 3) Two sentence-level perplexity-based coherence methods were developed, and we revealed that combining perplexity features with traditional coherence scores (proximity features because they are based on cosine similarity) resulting in better prediction models than using proximity or perplexity features alone. 4) The innovations and classical approaches were combined into the Comprehensive Coherence Calculator (CCC), a software package that can perform comprehensive coherence analysis with a myriad of configurations. With these contributions, a fully automated coherence assessment pipeline can be established to offer patients easy monitoring at home, clinicians the necessary information to provide better care, and researchers an objective quantitative basis for the study of semantic coherence.

 

November 24, 2025 – November 28, 2025

UPCOMING LECTURES AND SEMINARS
BIME 590
Presenter: Oliver Bear Don’t Walk IV, PhD
Thursday, December 4th – 11-11:50 am
850 Republican Street, Building C, Room 123 A/B
Zoom Information: https://washington.zoom.us/my/bime590
Speaker will present In-Person

Title: Indigenous Knowledge and Informatics Approaches to Health and Wellbeing

Abstract:
Indigenous approaches to health and wellbeing draw on rich and nuanced knowledge systems that have been developed over millennia. Biomedical research working with Indigenous epistemologies and ethics leads to unique findings that may not have been possible with colonial approaches to science. Biomedical informatics research is one such area that can greatly benefit by expanding our paradigms to include Indigenous knowledge. During this presentation I will talk about my research into natural language processing and machine learning for information retrieval and health outcomes prediction have been influenced by Indigenous knowledge and ethics. I’ll focus on research into retrieving social drivers of health and demographics from clinical notes, moving beyond simple identity categories in informatics, and community engages research to support community expertise and leadership.

Speaker Bio:
Dr. Oliver J. Bear Don’t Walk IV is a citizen of the Apsáalooke Nation and is an Assistant Professor at the University of Washington in Biomedical Informatics and Medical Education. Their research lies at the intersection of clinical natural language processing (NLP), fairness, and ethics. Dr. Bear Don’t Walk’s current work focuses collaborating with communities to describe social drivers of health and to incorporate this information into biomedical informatics, thereby enhancing the relevance and effectiveness of healthcare technologies for at-risk populations. Additionally, he incorporates intersectionality into his informatics approaches through community engagement, categorical definitions, and fairness audits.
 

ANNOUNCEMENTS
Please save the date for the big BIME Holiday Party on December 4th, 4-6pm at SLU. The amazing Chef Jason Vickers from Natoncks Metsu returns again this year. Details about games, prizes and the rest will soon follow. Please reserve this time, and reply to the RSVP, to celebrate the end of the year with your colleagues and friends.

 

UPCOMING EXAMS

General Exam
Title:
Towards Trustworthy Modeling of Patient Trajectory with Longitudinal Electronic Health Records
Student: Sihang Zeng
Date/Time:
Monday, December 1st, 2025, 9am PT
In-Person location: 850 Republican Street, Building C, SLU C122
Zoom:
https://washington.zoom.us/my/melihay

Abstract: Patient trajectory modeling, which predicts future clinical events using data from longitudinal electronic health records (EHRs), is expected to be of value for personalizing disease management. Yet the adoption of powerful deep learning models is often hindered by their “black-box” nature, creating a barrier to clinical trust. This challenge is compounded when modeling complex temporal dependencies between lab test results, treatments, and clinical events in the EHR. This dissertation proposal develops novel interpretable frameworks for patient trajectory modeling, motivated by the hypothesis that models that account for the full dynamics of a patient’s history will produce more reliable predictions than simpler models, and that these predictions can also be made transparent or interpretable. This work is structured around three aims that collectively affirm this hypothesis while innovating in terms of both deep learning methods and interpretable learning tools. Aim 1: To develop an interpretable deep learning model for predicting survival in metastatic prostate cancer from pre-metastasis serial PSA values and treatments. Aim 2: To advance from discrete-time modeling to a more precise continuous-time framework by developing a model that learns continuous latent trajectories and uses a divide-and-conquer interpretation to explain how clinical changes drive outcomes. Aim 3: To create a more generalizable framework by developing a multi-agent system that leverages large language models (LLMs) to reason over long and noisy EHR data for lung cancer risk prediction. Through these complementary aims, this research seeks to contribute to the development of more trustworthy AI tools that can support personalized clinical decision-making.

 

Final Exam
Title: Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk
Student: Xinyang Ren
Date/Time: Monday, December 1st, 2025, 1pm – 3pm PT
In-person location: 1601 NE Columbia Rd, South Campus Center 322
Zoomhttps://washington.zoom.us/my/cohenta

Abstract: Depression is one of the most common mental disorders globally, and can carry an increased risk of adverse outcomes, including suicide. Suicide is one of the leading causes of death worldwide, and many more individuals attempt it or experience suicidal thoughts. Compounding these severe public health problems is a longstanding shortage of mental health professionals. There are too many patients for available professionals to monitor effectively, presenting opportunities for the use of technology to expand their capacity. Natural language processing (NLP) methods have been widely applied to psychologically related text analysis tasks to draw relationships between text and the thoughts and feelings of the person who generated it, as indicators of their mental status. In this work, I investigated how language models can be harnessed to automatically detect depression symptom severity and suicide risk. Several challenges and limitations remain in this field. There is limited research involving clinical populations that utilize contextual embeddings from state-of-the-art language models to detect linguistic indicators of depression and suicide risk. Moreover, certain patient-generated data sources that can reveal mental status, notably text-based therapy, Google search logs, and YouTube activities, remain underexplored. Existing research has primarily concentrated on electronic health record (EHR) data and social media posts, which are subject to certain limitations. Furthermore, despite the rapid development of large language models, their clinical application remains challenging due to high computational costs and ethical concerns. To fill these gaps, I have developed a series of research. Specifically, I have analyzed the use of contextual embeddings of first-person singular pronouns as predictors of depression symptom severity. To explore the use of individualized web searches for suicide risk assessment, I have evaluated the effectiveness of anomaly detection methods in identifying search pattern changes that precede a suicide attempt using personal Google search data. The proposed framework for semantic feature construction provides a computationally efficient, tractable approach that can be applied to web search logs at scale. The methods were further applied to study participants’ YouTube activity data, which were combined with Google search logs to enhance anomaly detection performance. This work demonstrates the potential of effectively using language models for automatic prediction of depression symptom severity and detection of suicide risk using real-world datasets.

 

Final Exam
Title: Deep clustering to identify subgroups of multivariate trajectories in longitudinal biomedical datasets
Student: Bhargav Vemuri
Date/Time: Monday, December 1st, 2025, 1pm PT
In-person location: Institute for Systems Biology, 401 Terry Ave N, Room 106C
Zoomhttps://washington.zoom.us/my/peter.th

Abstract:
Unsupervised patient subgrouping in longitudinal biomedical datasets enables the discovery of distinct temporal phenotypes that capture heterogeneity in disease progression, treatment response dynamics, developmental trajectories, and more. Multivariate time series (MVTS) deep clustering methods are well-suited to this task because they (1) jointly model multiple longitudinal variables and (2) integrate missing data imputation, representation learning, and clustering into a unified framework. Recent state-of-the-art MVTS deep clustering approaches include Variational Deep Embedding with Recurrence (VaDER; de Jong et al., 2019) and Clustering Representation Learning on Incomplete time-series data (CRLI; Ma et al., 2021). In this work, we apply CRLI in two real-world longitudinal biomedical contexts and evaluate its performance against VaDER using 20 synthetic MVTS datasets of our own design.

In Aim 1, we explored CRLI’s capacity to detect multivariate trajectories in the electronic health record (EHR). Temporal EHR data is marred by irregular measurement intervals, high missingness, and multiple biases (selection, measurement, time-related). We assessed how well CRLI handles these hurdles in the context of identifying GLP-1 medication (semaglutide, dulaglutide, etc.) treatment response subgroups in the NIH All of Us Research Study.

In Aim 2, we applied CRLI to another real-word data source, the Adolescent Brain Cognitive Development (ABCD) Study, a longitudinal observational cohort with a prespecified assessment protocol, including a consistent follow-up schedule and a high retention rate (98.9%). This dataset allowed us to explore physical health trajectories (pubertal hormones, anthropometrics) as we did in Aim 1, but also mental health trajectories, as measured by 8 Child Behavior Checklist (CBCL) syndrome scales. We calculated cluster associations with mental health outcomes to better characterize cluster differences.

In Aim 3, we designed a framework using the mockseries Python package that let us rapidly generate unique MVTS datasets by sampling from a range of values for various datasets characteristics (time series length, noise, missingness, number of clusters, number of samples). We also incorporated the ability to modify time series variable properties (trend, rate of change, seasonality) by designing 5 distinct variable styles inspired by biomedical trends we observed in Aims 1 and 2 and the literature. We reported VaDER and CRLI performance on 4 external clustering validation indices (purity, RI, ARI, NMI) across 20 synthetic datasets.

 

Final Exam
Title
Leveraging Large Language Models for Clinical Information Extraction in Radiology Reports
Student: Namu Park
Date/Time: Monday, December 8, 2025, 10:30 AM PT
In-person location: Zoom Only
Zoomhttps://washington.zoom.us/my/melihay

Abstract: Medical imaging plays a central role in diagnosing, monitoring, and managing a wide spectrum of diseases, including cancer, cardiovascular disorders, neurological conditions, and musculoskeletal abnormalities. Radiologists interpret complex imaging data and summarize their findings in narrative reports, which remain largely unstructured. The rapid expansion of imaging utilization has led to an overwhelming volume of such reports, posing significant challenges for clinical decision support. Their unstructured format limits automated analysis, secondary use, and integration into downstream clinical workflows. This dissertation addresses two major barriers to the effective use of radiology reports in data-driven clinical systems: (1) the absence of publicly available, large-scale annotated corpora of radiology reports with detailed clinical findings suitable for training supervised models, and (2) the limited application of machine learning approaches, particularly large language models (LLMs), to real-world clinical tasks at scale. To overcome these challenges, the research is organized around three core aims: (1) developing a corpus of radiology reports annotated with detailed clinical findings and designing an advanced information extraction framework optimized for radiologic text; (2) evaluating the performance of diverse machine learning approaches, with emphasis on LLMs, for the practical task of identifying follow-up imaging recommendations; and (3) constructing a large-scale repository of incidental findings (incidentalomas) derived from the model outputs and proposing an NLP-based framework for automated incidentaloma detection to enhance clinical decision-making. Collectively, this work contributes a high-quality annotated dataset for radiologic text analysis and demonstrates the feasibility and utility of LLM-based approaches for transforming unstructured radiology reports into structured clinical intelligence, advancing the integration of medical imaging data into precision healthcare.

General Exam
Title: Understanding Glioblastoma Heterogeneity through Spatial Transcriptomics
Student: Wenyu Zeng
Date/Time:
Thursday, December 4, 2025, 3:30 -5:30 PM PT
In-person location:
850 Republican Street, Building C, C259
Zoom:
 https://washington.zoom.us/s/93471587664

Abstract: Glioblastoma (GBM) is the most aggressive primary brain tumor in adults, characterized by profound cellular and spatial heterogeneity that drives therapeutic resistance and recurrence. Despite major advances in genomic and single-cell technologies, the spatial organization of tumor and microenvironmental cells, how malignant, immune, and vascular populations interact across distinct tumor regions, remains poorly understood. Addressing this gap is essential for uncovering the molecular programs and signaling interactions that define aggressive GBM subclones and for identifying new therapeutic targets. This project will develop and apply a computational framework for large-scale spatial transcriptomic analysis of human GBM using NanoString’s CosMx Spatial Molecular Imager (SMI) platform. The dataset comprises 12 CosMx slides representing multiple anatomical regions from several GBM patients, totaling over 550,000 cells and 6,175 genes. Analyses will be performed using a mixture of Python and R packages, which enables efficient processing of large spatial datasets while maintaining reproducibility and scalability.

 

Final Exam
Title
Advancing Variant Interpretation: A Gene-Specific Framework for Prioritization, Prior Estimation, and Calibration to Enhance Evidence Strength and Clinical Significance Classification
Student: Yile Chen
Date/Time: Tuesday, December 9, 2025, 1 – 3 PM PT
In-person location: Sciences S 060 Genome Sciences, William H. Foege Hall, 3720 15th Ave NE, Seattle, WA 98195
Zoomhttps://depts.washington.edu/gsrestrc/remote.htm

Abstract: This dissertation develops gene-specific informatics frameworks to improve functional assay prioritization, pathogenicity prior estimation, and the calibration of Variant Effect Predictors (VEPs)—all aimed at reducing the high burden of Variants of Uncertain Significance (VUS) in genomic medicine. By integrating statistical modeling, positive–unlabeled learning, domain-aware clustering, and dynamic calibration strategies, the work advances the ACMG/AMP Bayesian framework toward more accurate, context-aware variant interpretation.

First, a gene prioritization framework was created to identify genes where functional assays would have the greatest clinical impact. Using Multiple Score Optimization, the method simultaneously considers VUS “movability,” potential correction of misclassified variants, and improvements achievable through computational predictors. This approach highlights high-value genes such as TSC2, providing a principled strategy for directing experimental resources.

Second, the dissertation replaces the assumption of a universal disease prior with gene-specific pathogenicity priors estimated through a refined positive–unlabeled (PU) learning method (DistCurve). These priors vary substantially across genes (roughly 1–30%). For genes lacking sufficient labeled variants, a complementary domain-based clustering method enables prior estimation at the protein-domain level, extending applicability to thousands of genes.

Third, building on these priors, a gene-aware VEP calibration framework was developed to convert raw predictor scores into calibrated PP3/BP4 evidence strengths. Because no single method performs optimally across all genes, a dynamic decision-tree workflow was designed to automatically select the best calibration strategy based on a gene’s data characteristics. This gene-specific approach outperforms global calibration methods and reduces miscalibrated evidence assignments. Additionally, a mixed-predictor strategy—choosing the best VEP per gene—further improves variant classification accuracy.

Together, these contributions establish a context-aware decision-support ecosystem that better directs functional assay investment, provides robust statistical foundations for Bayesian interpretation, and improves the reliability of computational evidence. The resulting framework enhances the accuracy, consistency, and clinical actionability of genomic variant classification.