Big Data is an evolving field, whose definition is fluid, and will continue to evolve over the years. Thus, the core of our educational approach is a comprehensive interdisciplinary, multifaceted practical training program.

## Core Curriculum

An integrated, multidisciplinary set of courses that prepare the students in the algorithmic, statistical, systems, and scientific aspects of Big Data. *This curriculum is an overlay on top of the requirements of the participating departments in a manner that is specific to each department. **Please contact the faculty liaisons for detailed information.*

**Three out of four of the following core courses:**

**CSE 544 – Data Management. **This course focuses on how to use data management systems and how to build them, including recent advances in the field.

- Basic knowledge of data structures (e.g., tree structures)

Background course = CSE 326. - Basic knowledge of the operating system
- Comfortable programming in Java

Background course = CSE 143.

**CSE 546/STAT 535 – Foundational Machine Learning**

- Linear algebra (eigenvectors, eigenvalues, solving linear systems).

Background course = MATH 318 or 308. - Familiarity with multivariate calculus (partial derivatives, multiple integrals).

Background course = MATH 324. - Fundamental ideas of probability

Background course = STAT 391 or STAT 394-395. - Comfort with basic programming in Java, Python, or R

Background course = CSE 143.

**CSE 512 – Data Visualization**

- Basic programming expertise; familiarity with or willingness to learn a high-level programming language like Python or JavaScript.

Background course = CSE 143. - Comfort with fundamental data structures and algorithms.

Background course = CSE 332 or CSE 373. - Familiarity with fundamentals of (one or more of) interaction design, computer graphics, statistics, databases or natural language processing a plus, but by no means required.

**STAT 509 or STAT 512-513 (a more in-depth version)**

- Linear algebra (eigenvectors, eigenvalues, positive definite matrices).

Background course = MATH 318 or 308. - Familiarity with multivariate calculus (partial derivatives, multiple integrals, Jacobians).

Background course = MATH 324. - Fundamental ideas of probability.

Background course = STAT 394-395, or possibly STAT 391. - Familiarity with basic statistical inference (hypothesis tests, estimators, confidence intervals) a plus. Background course = STAT 311.

Additionally, to further expand students’ education and create a campus-wide community, students register for** at least 4 quarters** in the weekly “Data Science Seminar”, ENGR 591.

## Steering Committee

Ginger Armbrust, Oceanography

Magdalena Balazinska, Computer Science & Engineering

David Beck, Chemical Engineering

Andrew Connolly, Astronomy

Tom Daniel, Biology

Ioana Dumitriu, Mathematics

Ione Fine, Psychology

Emily Fox, Statistics

Carlos Guestrin, Computer Science & Engineering

William Noble, Genome Sciences