Graduated: January 1, 2011
Mining Mountains of Data: Organizing All Atom Molecular Dynamics Protein Simulation Data into SQL
A significant portion of my research has involved organizing all atom molecular dynamics protein simulation data into a form that is both manageable and is conducive to analysis. These data consist of multi-gigabyte collections of four-dimensional atomic coordinates (x, y, z and time) and secondary analyses, as well as classification data used to select and organize the proteins for simulation. The initial database design was released in 2007 and published in 2008 as the Dynameomics Data Warehouse1, and has been in continuous development to accommodate an ever increasing number and length of simulations. The Consensus Domain Dictionary2 (CDD), released in 2010, defines a rank ordered set of globular proteins that sample the most frequently occurring protein folds found in the Protein Data Bank. Andrew's defense presented the CDD database, the dimensional model at the core of the data warehouse, and a novel method for optimizing queries involving spatial data stored in relational tables.
Last Known Position:
Principal Informatics Scientist, Cognitive Medical Systems, Inc.
Valerie D. Daggett (Chair), James F. Brinkley, Ira J. Kalet, Peter J. Myler, Thomas R. Quinn (GSR)