Skip to main content

Ehsan Alipour

Graduated: June 13, 2025

Thesis/Dissertation Title:

Evaluating Multi-Modal Data Fusion Approaches for Predictive Clinical Models Using Multiple Medical Data Domains

Multimodal deep learning models have emerged as powerful tools in biomedical research, offering the ability to integrate diverse data sources such as clinical records, multi-omics data, imaging, survey responses, and wearable data to enhance predictive accuracy and deepen understanding of complex medical phenomena. Central to multimodal modeling is the process of data fusion, where information from different modalities is integrated in a unified model. Three primary fusion strategies exist in deep learning: early fusion (feature-level), intermediate fusion and late fusion (decision-level). While widely adopted in other domains, their comparative performance and implementation considerations remain underexplored in biomedical applications, where data heterogeneity, missingness, and varying dimensionality present additional challenges.
This dissertation aims to evaluate the implications of data fusion strategies in developing multimodal predictive models in medicine. Across three distinct aims, we assess the impact of early, intermediate, and late fusion techniques on predictive performance, implementation complexity, and generalizability using diverse combinations of data types, outcomes, and modeling strategies. These studies span multiple datasets and outcome types (binary vs continuous) providing a broad view of fusion strategy utility in real-world biomedical settings.
In Aim 1, we evaluated and compared early, intermediate, and late fusion strategies for integrating longitudinal EHR, genomic, and survey data to predict chronic kidney disease (CKD) progression in patients with type 2 diabetes using a novel transformer-based multimodal architecture. Using data from the NIH’s All of Us initiative, we trained models on a cohort of approximately 40,000 patients. While unimodal models—particularly those based on EHR data—achieved strong baseline performance with an AUROC of 0.73 (0.71 - 0.75), the inclusion of multimodal data offered only marginal improvement with an AUROC of 0.74 (0.72 – 0.76), with the benefit limited to the early fusion approach and lacking statistical significance. This aim highlighted the challenges of overfitting in complex fusion architectures and emphasized the role of modality-specific predictive strength.
In Aim 2, we extended the fusion analysis to imaging data by combining a convolutional neural network (CNN) trained on longitudinal cross-sectional imaging with a shallow neural network trained on clinical and pathology variables to predict post-surgical margin status in patients with soft tissue sarcoma (n=202). Here, the intermediate fusion strategy significantly outperformed other approaches, achieving an AUROC of 0.80 (0.66–0.95), suggesting that cross-modal interactions between histologic features and imaging embeddings may be best captured through intermediate fusion. This result demonstrated the potential value of intermediate fusion when complementary signals exist across modalities.
In Aim 3, we explored fusion strategies for estimating continuous CT-derived body composition metrics (e.g., visceral, and subcutaneous fat volumes) using only chest radiographs and clinical variables in a dataset of 1,088 patients. A multitask multimodal model was developed and evaluated across early, intermediate, and late fusion strategies. Late fusion consistently delivered the best performance across most body composition metrics, closely followed by intermediate fusion. These results suggest that when individual modalities offer high independent predictive power, decision-level integration may be optimal for regression tasks.
Collectively, this work provides a comprehensive evaluation of data fusion strategies in multimodal biomedical modeling, highlighting their strengths, limitations, and practical considerations. Findings suggest that no single fusion strategy universally outperforms the others; rather, optimal fusion depends on data characteristics, model architecture, and task-specific objectives. This dissertation lays the groundwork for future research aimed at developing adaptive fusion strategies tailored to the complexities of real-world biomedical data.