A flowchart about Outcome Prediction in NSCLC depicting unstructured text data from a pathology report being extracted into a language model and then an analysis providing a survival prediction.

Large Language Models for Predicting Survival in Non-Small Cell Lung Cancer from Pathology Reports

Project Lead: Jie Fu, UW Medicine

Data Science Lead: Joseph Hellerstein

Pathology reports contain detailed descriptions of tumor characteristics, but much of this information is unstructured text that is difficult to analyze with traditional methods. In this project, we explore how large language models (LLMs), advanced AI systems designed to understand and interpret human language, can extract meaningful patterns from these reports to help predict patient outcomes in non-small cell lung cancer (NSCLC).

We developed and evaluated multiple AI approaches that use pathology reports to estimate patients’ likelihood of surviving beyond 2 years after diagnosis. Our results show that LLM-based methods can capture clinically relevant information directly from free-text reports, achieving performance comparable to more complex machine learning pipelines. This work highlights the potential of AI to transform routinely collected clinical text into actionable insights, with future directions focusing on combining text with imaging and clinical data to further improve prediction accuracy.