UW Data Science Seminar: Steve Mussmann


4:30 pm – 5:30 pm

Please join us for a UW Data Science Seminar event on Thursday, January 5th from 4:30 to 5:20 p.m. PDT. The seminar will feature Steve Mussmann, Data Science Postdoctoral Fellow at the Institute for Foundations of Data Science (IFDS) at the University of Washington.

Use this zoom link to join


“Reducing data and computational requirements in machine learning with data selection techniques”

Abstract: Building machine learning systems requires both data and computation, which can each require significant resources. This talk features research on techniques in active learning and data pruning to mitigate these costs by selecting informative data. For cases where labels for data are expensive, such as medical image annotations by trained physicians, active learning adaptively chooses which data to label. When training extremely large models on Internet-scale datasets require weeks of computation on large clusters, trimming away useless and redundant data, known as data pruning, decreases the computational training time. This talk covers several empirical and theoretical analyses of the most popular active learning algorithm, uncertainty sampling, including an NLP application where uncertainty sampling requires 14x less labeled data. For data pruning, we introduce an algorithm based on machine teaching that enjoys near-optimal theoretical guarantees and state-of-the-art results on several standard benchmark image classification datasets.

Biography: Steve is an IFDS Postdoctoral Fellow in the Paul G. Allen School of Computer Science & Engineering at the University of Washington working with Kevin Jamieson and Ludwig Schmidt on machine learning methods that reduce the required amount of data. He received a Ph.D. in 2021 from Stanford University in computer science advised by Percy Liang and a B.S. in 2015 from Purdue University in math, statistics, and computer science.


The UW Data Science Seminar is an annual lecture series at the University of Washington that hosts scholars working across applied areas of data science, such as the sciences, engineering, humanities and arts along with methodological areas in data science, such as computer science, applied math and statistics. Our presenters come from all domain fields and include occasional external speakers from regional partners, governmental agencies and industry.

The 2022-2023 seminars will be virtual, and are free and open to the public.