Project Summary
Deep neural transformer networks have advanced Natural Language Processing (NLP) performance but are large models with many parameters and hence vulnerable to bias. The large data sets required to train these models may be drawn from different study sites or clinical units leading to confounding by provenance where models make predictions using the characteristics of these data sources instead of diagnostically relevant information with erroneous predictions at the point of deployment. In the proposed research we will develop validated approaches for Deconfounding Deep Transformer Networks (DecondDTN) and disseminate them as open source tools so that these models can be applied more reliably to clinical problems.
This project also includes Serguei Pakhomov, PhD from the University of Minnesota.