Data Incubator FAQ

  1. Using Data Science /
  2. Data Science Incubator /
  3. Data Incubator FAQ

What is a Data Science Incubator?

The Data Science Incubator is a program that brings together data science experts with domain experts throughout the campus. The goal is to advance science by solving problems related to data scale, data visualization, machine learning, etc. The incubator works on the principle of direct collaboration. Each project has a project lead who works with the incubator staff the equivalent of two days per week.

Who is this for? Can I apply if I am a: professor, grad student, post-doc, undergrad, etc.?

Anyone is welcome to apply. We simply require that each project has a project lead who is willing to spend the equivalent of 16 hours per week working on their project with the incubator staff.

How long does an incubator project last?

Our initial plan is to schedule incubator projects on winter quarter boundaries. See our main incubator page for the current schedule.

Does my data need to be ‘clean’ before submitting a proposal?

Not necessarily. We frequently find that data preparation tasks take an inordinate amount of project resources. We are interested in exploring solutions to these problems that allow researchers to focus on “doing science” instead of munging around with data formats. It is helpful to have the data “in hand” or easily accessible before starting a project.

I’ve heard that I need to move my work to the ‘cloud.’ Can you help with this?

Yes. We have extensive expertise with moving code and data to cloud infrastructures such Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. A reasonable incubator project would be to move an existing workflow to the cloud in a cost-effective manner.

Is there a way to get informal help before submitting an incubator proposal?

We are currently holding office hours online. Please see our Office Hours page for more information.

I work with sensitive data. Can I still work with the incubator staff?

Yes. Our preference is to publish our code and data to the web, as this promotes transparency and reproducibility. That said, we understand that not all data sets are suitable for publication. We are flexible.