Methods: Reproducible & Open Science
Fields: Health Science

Authors: Jie Liu, William Noble, Jeffrey Bilmes, Anthony Blau


A model of tumor heterogeneityCancer is difficult to treat, not only because tumors are different from one patient to another, but also because tumor cells within a single tumor can be quite different from one another. In this scenario, we need more than one treatment in order to kill all the tumor cells, because the untreated tumor cells will relapse. Using high throughput sequencing, we are able to read out the genetic profile of a single tumor in bulk, but we still need a way to further decompose the tumor genotype into subclones.

We use graphical models and represent the subclones as hidden variables. We use maximum likelihood estimation to automatically identify the different subclones from the data. Moreover, we implement our model via an extensible platform, the Graphical Models Toolkit (GMTK), which makes our approach open, reproducible and extensible. When other people extend our model, they only have to specify the new model in a text file, and GMTK will handle the remaining computation.

In the future cancer care and treatment, there will be a fundamental shift from relying on tissue level information to relying on molecular level information. Our modeling system will allow more people to join us, facilitating the shift, accumulating knowledge about cancer, providing targeted therapy and eventually saving people’s lives.