Participating departments in the IGERT program have all defined a Big Data PhD track, which articulates how the IGERT requirements map to the department’s requirements without creating any additional burden. In some departments, these Big Data tracks go beyond this mapping.

UPDATE: An official Advanced Data Science Option has been approved and qualifying coursework will now be noted on transcripts! This will replace the Big Data Tracks, which are not transcripted.

Links below will be updated as each department updates individual web pages. Contact persons remain the same.

## Participating Departments

## Recommended Data Science Courses

**AA 543 (3) Computational Fluid Dynamics**

Examines numerical discretization of the inviscid compressible equations of fluid dynamics; finite-difference and finite-volume methods; time integration, iterative methods, explicit and implicit algorithms; consistency, stability, error analysis, and properties of numerical schemes; grid generation; and applications to the numerical solution of model equations and the 2D Euler equations.

Uri Shumlak

Current Listing

**AA 544 (3) Turbulence Modeling and Simulation**

Examines numerical discretization of the incompressible Navier-Stokes equation; projection method, introduction to turbulence; Reynolds Averaged Navier-Stokes equations; algebraic, one-equation, and two-equation turbulence models; large-eddy simulation; direct numerical simulation; and applications to the numerical solution of laminar and turbulent flows in simple geometries.

Antonio Ferrante

Current Listing

**AA 545 (3) Computational Methods for Plasmas**

Develops the governing equations for plasma models – particle, kinetics, and MHD. Applies the governing equation to plasma dynamics through the PIC method and integration of fluid evaluation equations. Examines numerical solution to equilibrium configurations, and linear stability by energy principle and variational method.

Antonio Ferrante

Current Listing

**AMATH 500A (1-2) High-Performance Scientific Computing**

This course will introduce aspects of scientific computing and computational science that go beyond the Matlab-based introduction of AMath 301 or 352 and will introduce other languages (primarily Fortran 90/95/2003 and Python), debugging strategies, parallel computing (at the multi-core and cluster level), visualization tools for large data sets, and concepts such as Validation and Verification (V&V), uncertainty quantification (UQ), reproducible research, and scientific software design.

Instructor

Current Listing

**AMATH 574 (5) Conservation Laws and Finite Volume Methods**

Theory of linear and nonlinear hyperbolic conservation laws modeling wave propagation in gases, fluids, and solids. Shock and rarefaction waves. Finite volume methods for numerical approximation of solutions; Godunov’s method and high-resolution TVD methods. Stability, convergence, and entropy conditions. Prerequisite: AMATH 586 or permission of instructor.

Randy LeVeque

Current Listing

**AMATH 581 (5) Scientific Computing**

Project-oriented computational approach to solving problems arising in the physical/engineering sciences, finance/economics, medical, social and biological sciences. Problems requiring use of advanced MATLAB routines and toolboxes. Covers graphical techniques for data presentation and communication of scientific results. Prerequisite: Proficiency in basic MATLAB or AMATH 301, or permission of instructor.

Eli Shlizerman

Current Listing

**AMATH 582 (5) Computational Methods for Data Analysis**

Exploratory and objective data analysis methods applied to the physical, engineering, and biological sciences. Brief review of statistical methods and their computational implementation for studying time series analysis, spectral analysis, filtering methods, principal component analysis, orthogonal mode decomposition, and image processing and compression. Prerequisite: either MATLAB and linear algebra or permission of instructor.

Jose Nathan Kutz

Current Listing

**AMATH 583 (5) High-Performance Scientific Computing**

This class will cover a selection of topics in high-performance computing (HPC), briefly introducing many of the issues that arise when solving large scale computational problems in science and engineering.

Ulrich Hetmaniuk

Current Listing

**AMATH 584 (5) Applied Linear Algebra and Introductory Numerical Analysis**

Numerical methods for solving linear systems of equations, linear least squares problems, matrix eigen value problems, nonlinear systems of equations, interpolation, quadrature, and initial value ordinary differential equations.

Anne Greenbaum

Current Listing

**AMATH 585 (5) Numerical Analysis of Boundary Value Problems**

Numerical methods for steady-state differential equations. Two-point boundary value problems and elliptic equations. Iterative methods for sparse symmetric and non-symmetric linear systems: conjugate-gradients, preconditioners. Prerequisite: AMATH 581 or MATH 584 which may be taken concurrently.

Anne Greenbaum

Current Listing

**AMATH 586 (5) Numerical Analysis of Time Dependent Problems**

Numerical methods for time-dependent differential equations, including explicit and implicit methods for hyperbolic and parabolic equations. Stability, accuracy, and convergence theory. Spectral and pseudospectral methods. Prerequisite: AMATH 581 or AMATH 584.

Instructor

Current Listing

**ASTR 427 (3) Numerical Methods of Astrophysics**

This is a hands-on course to learn methods for numerically solving problems that arise in astrophysics. Some programming experience is required. An emphasis is placed on high performance. Topics include ordinary differential equations, root finding, optimization, Monte-Carlo methods, basic data structures and algorithms, and parallel techniques.

Instructor

Current Listing

**ASTR 597B (3) Big Data in Astronomy: Introduction to Large Surveys**

The goal of this course is to prepare you for research with large survey data, teach you how to think about such data sets, and give you an overview of what is or soon will be available. While focused on astronomical surveys, the course may be suitable for advanced undergraduates and non-majors interested in learning about working with large scientific data sets.

Mario Juric

Current Listing

**ASTR 599 / AMATH 500 (1) Scientific Computing with Python**

This is a Graduate seminar course offered jointly through the UW Astronomy & Applied Math departments. It is designed as a comprehensive introduction to scientific computing in Python, geared toward graduate students, postdocs, and researchers in scientific fields which depend on analysis of large datasets.

Jake Vanderplas

Current Listing

**BIO 419/519 (4) Data Science for Biologists**

The objective of this course is to provide students with foundational knowledge in mathematics and basic tools in computation to practice data science in broadly biologically focused fields. The course will focus on the basics of data wrangling, data analytics, statistics and visualization. The target audience is advanced undergrads and beginning graduate students, including students studying biology, neurobiology, microbiology, bioengineering, and others fields working with biologically relevant data.

Bing Brunton

Current Listing

**BIOST 545(3) Biostatistical Methods for Big Omics Data**

Li Hsu

Current Listing

**BIOST 544 (4) Introduction to Biomedical Data Science**

Provides an introduction to biomedical data science with an emphasis on statistical perspectives, inducing the process of collecting, organizing, and integrating information toward extracting knowledge from data in public health, biology, and medicine. Prerequisite: either BIOST 511 or equivalent; either BIOST 509 or equivalent; or permission of instructor.

Noah Simon

Current Listing

**BIOST 546 (3) Machine Learning for Biomedical Big Data**

Provides an introduction to statistical learning for biomedical and public health data. Intended for graduate students in SPH/SOM. Offered: Spring

Ali Shojaie

Current Listing

**CSE 414 (4) Introduction to Database Systems**

Introduces database management systems and writing applications that use such systems; data models, query languages, transactions, database tuning, data warehousing, and parallelism. Intended for non-majors. Not open for credit to students who have completed CSE 344. Prerequisite: minimum grade of 2.5 in CSE 143.

Hal Perkins

Current Listing

**CSE 490B1 (X) Software Engineering for Biologists**

Biology has transitioned from a descriptive enterprise to a quantitative science. This transition has been driven by a combination of low cost sequencing technology and radical reductions in the cost of computing. Indeed, modern biology relies heavily on computational artifacts, both code and data, to produce scientific results (e.g., genomic data sets and computational tools for biochemical pathways.

Joe Hellerstein

Current Listing

**CSE 527 (4) Computational Biology**

Introduces computational methods for understanding biological systems at the molecular level. Problem areas such as network reconstruction and analysis, sequence analysis, regulatory analysis and genetic analysis. Techniques such as Bayesian networks, Gaussian graphical models, structure learning, expectation-maximization. Prerequisite: graduate standing in biological, computer, mathematical or statistical science, or permission of instructor.

Su-In Lee

Current Listing

**CSE 547 / STAT 592 (4) Machine Learning for Big Data**

Machine Learning and statistical techniques for analyzing datasets of massive size and dimensionality. Representations include regularized linear models, graphical models, matrix factorization, sparsity, clustering, and latent factor models. Algorithms include sketching, random projections, hashing, fast nearest-neighbors, large-scale online learning, and parallel (Map-reduce, GraphLab). Prerequisite: either STAT 535 or CSE 546.

Emily Fox

Current Listing

**CSE 599A (4) Molecular Biology as a Computational Science**

This is a course in molecular biology for computer science students interested in computational research in the Life Sciences, such as bioinformatics and bioengineering. The premise of the course is that cell biology can be described and analyzed in much the same way as complex software systems. Indeed, this is how Systems Biology studies gene programs.

The course assumes some exposure to object-oriented design, and makes use of python (although deep knowledge of python is not required). The course only requires a high school background in chemistry and biology.

Joseph Hellerstein

Current Listing

There are also several **undergraduate courses open to non-majors** that provide useful background. In particular:

CSE 373: Data Structures and Algorithms

CSE 374: Intermediate Programming Concepts and Tools

CSE 410: Computer Systems

CSE 415: Introduction to Artificial Intelligence

CSE 417: Algorithms and Computational Complexity

All courses are listed here.

**GENOME 540 (4) Intro to Computational Molecular Biology**

Algorithmic and probabilistic methods for analysis of DNA and protein analysis. Students must be able to write computer programs for data analysis. Prior coursework in biology and probability highly desirable. Prerequisite: permission of instructor.

Phil Green

Current Listing

**GENOME 541 (4) Intro to Computational Molecular Biology**

Provides a survey of topics within the field of computational molecular biology. Prerequisite: GENOME 540 or permission of instructor.

Bill Noble

Current Listing

**HCDE 511 (4) Information Visualization**

The design and presentation of digital information. Use of graphics, animation, sound, visualization software, and hypermedia in presenting information to the user. Vision and perception. Methods of presenting complex information to enhance comprehension and analysis. Incorporation of visualization techniques into human-computer interfaces.

Instructor

Current Listing

**HCDE 517 (4) Usability Studies**

Discusses the human-computer interface (HCI) as the communicative aspect of a computer system. Analyzes usability issues in HCI design, explores design-phase methods of predictability, and introduces evaluative methods of usability testing.

Instructor

Current Listing

**MATH 514 (3) Networks and Combinatorial Optimization**

Mathematical foundations of combinatorial and network optimization with an emphasis on structure and algorithms with proofs. Topics include combinatorial and geometric methods for optimization of network flows, matching, traveling salesmen problem, cuts, and stable sets on graphs. Special emphasis on connections to linear and integer programming, duality theory, total unimodularity, and matroids. Prerequisite: either MATH 308 or AMATH 352 any additional 400-level mathematics course. Offered: jointly with AMATH 514.

Thomas Rothvoss

Current Listing

**MATH 515 (3) Networks and Combinatorial Optimization**

Maximization and minimization of functions of finitely many variables subject to constraints. Basic problem types and examples of applications; linear, convex, smooth, and nonsmooth programming. Optimality conditions. Saddlepoints and dual problems. Penalties, decomposition. Overview of computational approaches. Prerequisite: linear algebra and advanced calculus. Offered: jointly with AMATH 515/IND E 515.

Instructor

Current Listing

**MATH 516 (3) Networks and Combinatorial Optimization**

Methods of solving optimization problems in finitely many variables, with or without constraints. Steepest descent, quasi- Newton methods. Quadratic programming and complementarity. Exact penalty methods, multiplier methods. Sequential quadratic programming. Cutting planes and nonsmooth optimization. Prerequisite: MATH 515. Offered: jointly with AMATH 516.

Instructor

Current Listing

**BIME 550 (3) Knowledge Representation and Applications**

What is a knowledge representation? Why are issues in knowledge representation important for biomedical informatics application builders? What is the relationship between knowledge and data, between knowledge bases and data bases? In addition to answering these questions, this course covers: frame-based systems, description logics, automatic theorem proving, complexity vs. tractability, ontologies, rule-based systems, and a variety of applications in the biomedical domain. Although we cover a fair amount of computer science (primarily artificial intelligence), the emphasis is on the implications of these results on the biomedical and health informatics field.

Ira Kalet

Current Listing

**BIME 591C (1) BHI Research Colloquium**

“Cancer, stochastic models and mathematical biology”

This section will focus on topics related to the Clinical Target Volume research project and to related topics more broadly in the area of mathematical biology. The topics will be largely dependent on the interests of the participants, but will include discussion of symbolic and functional programming (Lisp and ML), topics from abstract mathematics leading up to category theory, basics of stochastic modeling and Markov chains, anatomy, cancer (basic ideas, metastasis, staging, surgical and radiation therapy), and other applications of abstract mathematics in biology and medicine.

Ira J. Kalet

Current Listing

**SOC 590 (3) Special Topics in Sociology **

“Big Data and Population Processes”

In this course, we will study how traditional methods used in social sciences can help us make sense of new data sources, and how these new data sources may require new approaches and research design. There will be a mix of lectures, student-led discussions, and hands-on computational activities (e.g., how to access and analyze data from social media platforms like Twitter and Facebook, how to approach large data sets, etc.).

We will discuss a number of substantive topics related to the emergence of (big) data-driven discovery in social sciences, with emphasis on population processes. By the end of the course, students will be familiar with relevant literature at the intersection of demographic research and computational social science. The main goals of the course are i) to develop critical thinking about the emergent field of big data analysis ii) to learn some of the methods, approaches and tools of big data analysis iii) to identify research questions in your own area of interest that could be addressed with innovative data sources and to devise an appropriate research plan.

Emilio Zagheni

Current Listing

**SOCW1 590B (3) Interdisciplinary Research Career Development: Roadmaps & Practical Strategies**

Graduate seminar creating a forum for students spanning multiple disciplines to learn about national trends increasing need for interdisciplinary and transdisciplinary readiness in research careers as well as translations between research and real world application; focus on tools and strategies to increase one’s capacities and readiness for inter/transdisciplinary research oriented careers; engage collaboratively with peers from other disciplines in these aims; and hone your interdisciplinary career roadmaps for graduate training and beyond.

*Course undergoing final approval; contact instructor for information; emailing about interest in course is useful.

Paula Nurius

Current Listing

**STAT 302 (3) Statistical Software and Its Applications**

Introduction to data structures and basics of implementing procedures in statistical computing packages, selected from but not limited to R, SAS, STATA, MATLAB, SPSS, and Minitab. Provides a foundation in computation components of data analysis.

Friedrich-Wilhelm Scholz

Current Listing

**STAT 391 (4) Probability and Statistics for Computer Science**

Fundamentals of probability and statistics from the perspective of the computer scientist. Random variables, distributions and densities, conditional probability, independence. Maximum likelihood, density estimation, Markov chains, classification. Applications in computer science.

Instructor

Current Listing

**STAT 403 (4) Introduction to Resampling Inference**

Introduction to computer-intensive data analysis for experimental and observational studies in empirical sciences. Students design, program, carry out, and report applications of bootstrap resampling, rerandomization, and subsampling of cases.

Instructor

Current Listing

**STAT 592 / CSE 547 (4) Machine Learning for Big Data**

Machine Learning and statistical techniques for analyzing datasets of massive size and dimensionality. Representations include regularized linear models, graphical models, matrix factorization, sparsity, clustering, and latent factor models. Algorithms include sketching, random projections, hashing, fast nearest-neighbors, large-scale online learning, and parallel (Map-reduce, GraphLab). Prerequisite: either STAT 535 or CSE 546.

Emily Fox

Current Listing

**Certificate in Cloud Computing**

Gain an in-depth understanding of cloud computing models, applications, platforms, infrastructures and technologies. Get hands-on experience in developing scalable, efficient systems for the cloud and building scalable applications. Work on projects using frameworks like Hadoop and MapReduce, which enable massive scalability for processing and analyzing large data sets. Understand the platforms of key cloud vendors as well as the decision-making process for adopting a cloud migration strategy.

**Certificate in Data Science**

Develop the computer science, mathematics and analytical skills in the context of practical application needed to enter the field of data science. Discover how to use data science techniques to analyze and extract meaning from extremely large data sets, or “big data.” Become familiar with modern database systems, data models, and query interfaces. Learn how to use statistics, machine learning, text retrieval and natural language processing to analyze data and interpret results. Practice using these tools and techniques on data sets of increasing complexity and scale.