Yujuan “Velvin” Fu
Graduated: June 13, 2025
Thesis/Dissertation Title:
Evaluating and Enhancing Large Language Models (LLMs) in the Clinical Domain
Recent advancements in large language models (LLMs) have demonstrated human-level performance on many specialized medical tasks, even without annotated training data. However, three main challenges remain: (1) due to the sensitive and highly specialized nature of clinical narratives, as well as the high cost of human expert annotation, there is a lack of high-quality, well-structured, and clinically meaningful datasets for LLM training and evaluation; (2) current medical LLMs show limited generalization ability to interpret and extract complex clinical information on certain unseen natural language understanding (NLU) tasks; and (3) as LLMs are typically trained on vast amounts of data, there is a substantial risk of data contamination, where evaluation benchmarks unintentionally overlap with training data, leading to inflated test performance and potentially reduced performance on truly novel tasks.
In this work, we address these limitations through three core aims: (1) develop benchmark datasets for clinical information extraction (IE), a key NLU subtask, across two critical medical domains, and evaluate the performance of multiple state-of-the-art (SOTA) transformer-based language models (LMs), under both fine-tuning and in-context learning settings; (2) develop a more generalizable medical NLU model via instruction tuning, demonstrating enhanced performance on previously unseen clinical NLU datasets; and (3) systematically review existing detection approaches for data contamination and evaluate those approaches on datasets used during pre-training and fine-tuning LLMs, with our own and three other widely used open-source LLMs.
In summary, our work contributes to the development of both clinical benchmarks and robust LLMs, as well as highlighting the ongoing challenges in benchmarking LLMs' generalizability.