“The battle of two cultures: statistics versus (?) data science”

Wednesday, Apr. 3, 2019, from 4:30 to 5:20 p.m. — Physics/Astronomy Auditorium, room A118

Bhramar Mukherjee, Ph.D., John D. Kalbfleisch Collegiate Professor and Chair, Department of Biostatistics

[Find the slides from this presentation here: UWtalk.]


The title of my talk is inspired by Leo Breiman’s seminal paper from 2001, “Statistical modeling: the two cultures” where Breiman describes the intellectual tension between a classic stochastic modeler versus an algorithmic modeler. The shock that “data science” has injected into the world of statistics has been palpable. The next generation of students are perhaps finding the job title data scientist more exciting than being a good old statistician. In this talk, I will try to share the joy (and associated anxiety) of being a classically trained statistician at a time when our science and society are undergoing an unprecedented information/data revolution.

I will discuss statistical challenges and opportunities with joint analysis of electronic health records and genomic data through “Phenome-wide association studies” (PheWAS). I will posit a modeling framework that helps us to understand the effect of both selection bias and outcome misclassification in assessing genetic associations across the medical phenome. I will use data from the UK Biobank and the Michigan Genomics Initiative, a longitudinal biorepository at Michigan Medicine launched in 2012, to illustrate the analytic framework. The examples illustrate that understanding sampling design and selection bias matters for big data, and are at the heart of doing good science with data. This is joint work with Lauren Beesley and Lars Fritsche at the University of Michigan.


A photo of Bhramar Mukherjee

Bhramar Mukherjee is the John D. Kalbfleisch Collegiate Professor and Chair, Department of Biostatistics; professor, Department of Epidemiology, professor, Global Public Health, University of Michigan School of Public Health; research professor and core faculty member, Michigan Institute of Data Science (MIDAS), University of Michigan. She also serves as the associate director of Cancer Control and Population Sciences, the University of Michigan Rogel Cancer Center. She is the cohort development core co-director in the University of Michigan’s institution-wide Precision Health Initiative. Her research interests include statistical methods for analysis of electronic health records, studies of gene-environment interaction, Bayesian methods, shrinkage estimation, and analysis of multiple pollutants.

Collaborative areas are mainly in cancer, cardiovascular diseases, reproductive health, exposure science and environmental epidemiology. She has co-authored more than 200 publications in statistics, biostatistics, medicine, and public health, and is serving as a principal investigator on NSF and NIH-funded methodology grants.

She is the founding director of the University of Michigan’s summer institute on big data. Bhramar is a fellow of the American Statistical Association and the American Association for the Advancement of Science. She is the recipient of many awards for her scholarship, service, and teaching at the University of Michigan and beyond.