Graduated: January 1, 2010
A Rule-Based Strategy for Accurately Describing Gene Content Similarities and Differences Across Multiple Genomes
A fundamental tasks in genome research is that of comparing gene content between multiple genomes. In infectious disease research such comparisons are critical for determining the underlying parasite genetic factors that are responsible for disease transmission, pathogenicity and clinical outcome. Although numerous technologies exist for comparing gene sequences and grouping similar genes, the genomics field lacks structured methods for describing the complicated evolutionary dynamics that give rise to the differences between the compared species. In this dissertation I put forth novel technologies for accurately and precisely describing differences in gene content across multiple genomes.
First, I introduce a light-weight knowledge representation specification that allows us to aggregate gene annotation and sequence comparison data from heterogeneous sources. Next, I describe a new ontology for describing pairwise homology relationships between genes, as well as a rule-based system for applying those terms to sequence comparison results. I then detail a novel method for grouping genes based on the nature of their homology relationships. Finally, I present a technique for querying the gene groups in order to uncover interesting evolutionary trends across the compared genomes. These methods represent a significant advance in the clarity and detail with which large scale comparative genomics can be described; furthermore, the novel techniques that I present in this work are amenable to integration with existing sequence comparison and clustering technologies.
Last Known Position:
Senior Manager - Special Programs, Intellectual Ventures
Peter J. Myler (Chair), Roger E. Bumgarner, John H. Gennari, Walter L. Ruzzo (GSR)