Brian Chang
Graduated: June 13, 2025
Thesis/Dissertation Title:
Leveraging multimodal models to detect osteoporotic compression fractures
Osteoporosis is a chronic disease of low bone mineral density that affects older patients, predisposing them to fractures. While osteoporosis screening is evidence-based, it remains grossly under-utilized. Osteoporotic compression fractures (OCFs) are an early biomarker for osteoporosis but are often misclassified and under-reported on review by radiologists. Opportunistic screening, or leveraging pre-existing data, to detect OCFs to augment the current standard of osteoporosis screening could prompt appropriate diagnostic studies, treatment, and risk management. Current fracture detection tools show promise but are limited by key factors, namely manual curation of data inputs and lack of external validation and generalizability, limiting their potential clinical utility. They are also based on unimodal models, or those that leverage a single data modality. Multimodal models that leverage more than one modality have shown improved performance in clinical tasks and also better reflect real-world clinical workflows.
In this dissertation, I focus on developing and evaluating multimodal models to detect OCFs, leveraging unstructured clinical notes, radiographs, and structured electronic health record (EHR) data. To achieve this, a spine radiograph dataset from previous work in our group was used. Matching patient IDs from this dataset, I obtained clinical notes from a quaternary healthcare enterprise database to annotate fracture events. With these datasets, I implemented and evaluated unimodal models for each of the modalities above (images only and notes only) to produce outputs for the multimodal models, described in the following aims in this research:
1) Aim 1: Implement and evaluate transformer models to extract fracture events from clinical notes. An ensemble algorithm to consolidate fracture events at the note- and patient-level was also developed to produce both structured data representing a patient history of fractures and feature representations for downstream input separately to multimodal models (Aim 3). Evaluation metrics demonstrated that fine-tuned transformer models are able to extract fracture events from clinical notes with good performance, albeit limited by the small training corpus.
2) Aim 2: Develop and assess an imaging analysis pipeline for detecting OCFs. An imaging analysis pipeline consisting of independently developed machine learning models were chained in a fully automated framework. Evaluation of the pipeline was performed with a dataset of radiographs acquired in various clinical settings to measure real-world performance. While we were able to develop a performant fully automated pipeline, the evaluation demonstrated subpar performance in detecting positive cases for OCFs at the image-level.
3) Aim 3: Develop and assess whether multimodal models combining NLP, imaging analysis, and structured EHR data perform better than imaging analysis alone in detecting OCFs. With the structured data and feature representations from Aim 1, the imaging analysis pipeline predictions from Aim 2, and other structured EHR data, numerous multimodal model architectures were trained and evaluated in detecting OCFs at the patient-level. The evaluation of these models demonstrated better performance than the unimodal models (images and notes only) in detecting OCFs even with a small training corpus, reaching an acceptable absolute performance.