Explainable query generation for cohort discovery and biomedical reasoning using natural language.
Clinical trials serve a critical role in the generation of medical evidence and progress in biomedical research. In order to identify potential participants, investigators publish eligibility criteria, such as certain conditions, treatments, or laboratory test results. Recruitment of participants remains, however, a major barrier to successful trial completion, and manual chart review of hundreds or thousands of patients to determine a candidate pool can be prohibitively labor- and time-intensive. While cohort discovery tools such as Leaf or i2b2 can serve to assist in finding participants meeting eligibility criteria, such tools nonetheless often have significant learning curves. Moreover, certain complex queries may simply be impossible due to structural limitations on the types of possible queries presented in these tools. An alternative approach is the use of natural language processing (NLP) to automatically analyze eligibility criteria and generate queries. Such approaches have the advantage of leveraging existing eligibility criteria composed in a free-text format researchers are already familiar with. The goal of this project is the development of a cohort discovery tool called LeafAI. In Aim 1: we created a gold-standard annotated corpus of eligibility criteria. In Aim 2: we developed methods for generating data model-agnostic SQL queries and multi-hop biomedical reasoning using a natural language interface rivaling human performance. In Aim 3: we developed an interactive chatbot-like web application to enable users to dynamically query clinical databases for cohort discovery using natural language.