Graduated: January 1, 2014
Data-driven Methods and Models for Predicting Protein Structure using Dynamic Fragments and Rotamers
Proteins play critical roles in cellular processes. A protein’s conformation directly relates to its biological function and, consequently, determination of such structure can provide great insight into a protein’s function. Using a computational technique called molecular dynamics (MD), we are able to simulate and observe protein dynamics at a much higher temporal and spatial resolution than allowed by experimental methods. Dynameomics is a research endeavor that uses MD to produce thousands of protein simulations, resulting in hundreds of terabytes of data. Using novel visual analytics techniques, we have mined the Dynameomics data warehouse for data on protein backbone segments and side-chain behavior, called fragments and rotamers, respectively. Knowledge derived from these dynamic fragments and rotamers was used to improve the quality of protein loop structure predictions. We have created novel data models to store, analyze and compare fragments and side-chain rotamers, then developed methods to predict loop structures with information inferred from these data models. Protein loop regions predicted from these fragments and rotamers produce biologically relevant structures that improve upon current protein loop prediction methods. In conjunction with the fragment and rotamer research, we produced a novel visual analytics framework called DIVE, a Data Intensive Visualization Engine. This software has been instrumental in advancing our bioinformatics research, but it is a general-purpose framework applicable to a wide range of big data problems.
Last Known Position:
Senior Data Scientist, PNNL
Valerie D. Daggett (Chair), James F. Brinkley, Ira J Kalet, Walter James Pfaendtner (GSR)