Skip to main content

Tim Wu

Graduated: December 15, 2018

Thesis/Dissertation Title:

Bicluster-Based Identification of Gene Sets Through Multivariate Meta-Analysis (MVMA)

Omics technologies have transformed biology and medicine by generating massive amount of high-resolution data. Much of the data have been made publicly available but have not been fully explored or utilized. The current study aims to mine public gene expression to discover gene sets that may correspond to biological pathways. The challenges with using public data include data heterogeneity, high dimensionality, and small sample sizes. The overall research questions include: (1) what is the data mining method best suited for finding gene sets; and (2) how best to utilize multiple datasets in order to increase statistical strength. Aim 1 is to determine optimal method for constructing bicluster stacks. Aim 2 is to determine suitability of meta-analysis techniques to pool biclusters and assess performance, and Aim 3 is to assess potential utility of gene sets identified in Aim 2 using pathway analyses.
In Aim 1, we demonstrate the technique of biclustering in gene set identification, based on a number of key advantages of biclustering over the traditional clustering methods. In addition, we show that synthesis of summary statistics (biclusters in this case) is a better approach for utilizing multiple datasets compared to simply aggregating the source datasets together. For Aim 2, we adapt the framework of multivariate meta-analysis (MVMA), and a previously published two-step procedure to tackle the issue of high dimensionality with an improvement that involves a sparse estimate for the between-study covariance matrix using the graphical lasso algorithm. The improvement leads to a significant increase in the performance of MVMA in classifying real genes from background genes. In Aim 3, the gene sets found to be significant according to MVMA are further investigated by knowledge-based pathway analyses. The results suggest that the overall effect sizes are a predictor of biological relevance of the gene sets, which is the most significant finding of the study.

Last Known Position:

Senior Fellow, UW Department of Anesthesia & Pain Management


Peter Tarczy-Hornoch, Brian Browning, Roger Bumgarner, Shuai Huang