Graduated: November 1, 2009
Automated Learning of Protein Involvement in Pathogenesis Using Integrated Queries
Methods of weakening and attenuating pathogens' abilities to infect and propagate in a host, and thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories. A lightweight method for data integration is introduced, which links information regarding a protein via a path-based query graph and supports both exploratory and logical queries; data gathered in this way is characterized with experiments on retrieving high-quality annotation data. A system and method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are leveraged against the problem of generalized and specific virulence function prediction. This approach improves coverage of functional data over a protein, outperforms other recent approaches to identification of virulence factors, is robust to different weighting schemes of varying complexity and is found to generalize well to traditional function prediction.
Last Known Position:
Senior Data Science Manager, Microsoft
Peter J. Myler (Chair), Ira J. Kalet, William S. Noble, Peter Tarczy-Hornoch, Evan E. Eichler (GSR)