Graduated: June 11, 2022
The Use of Natural Language Processing and Machine Learning for the Early Diagnosis of Lung and Ovarian Cancer
Cancer is a serious diagnosis and diagnostic delay is correlated with reductions in survival rates following treatment. For many cancers, providers can only rely on symptoms and signs to diagnose patients. These details are recorded primarily free text clinical notes. Natural language processing (NLP) can be used to extract symptoms/signs from these notes for population level diagnosis screening. This creates opportunity for machine learning to alert providers earlier in the diagnostic process using existing, but easily overlooked information.
Thus, the focus of this thesis was to determine opportunities for reducing diagnostic delay in ovarian and lung cancer. A symptom extraction model trained on a primarily COVID-19 population was adapted to lung and ovarian cancer populations. The model then extracted symptoms/signs from a retrospective case-control study (ovarian) developed as part of this work as a well a leveraged study (lung). Symptom frequencies for ovarian cancer were then explored across different routes to diagnosis. Finally, this thesis developed experiments using machine learning models to predict lung and ovarian cancer prior to diagnosis. This work showed early prediction using symptoms was only possible on the lung cohort. Nevertheless, both cohorts had significantly higher “next step” recommendations in cases as compared to controls, even 6 months prior to diagnosis.
Meliha Yetisgen (chair), Matthew Thompson