Data Science Incubator FAQs
Learn more about proposing an Incubator project
Click the questions below to expand and show the answers.
What is the Data Science Incubator program?
The Data Science Incubator is a program that brings together data science experts with domain experts throughout the campus. The goal is to advance science by solving problems related to data scale, data visualization, machine learning, etc. The incubator works on the principle of direct collaboration. Each project has a project lead who works with the incubator staff the equivalent of two days per week.
Who is this for? Can I apply if I’m a professor, grad student, postdoc, or undergrad?
Anyone is welcome to apply. We simply require that each project has a project lead who is willing to spend the equivalent of 16 hours per week working on their project with the incubator staff.
How long does an Incubator program last?
Our initial plan is to schedule incubator projects on winter quarter boundaries. See our main incubator page for the current schedule.
Does my data need to be “clean” before submitting a proposal?
Not necessarily. We frequently find that data preparation tasks take an inordinate amount of project resources. We are interested in exploring solutions to these problems that allow researchers to focus on “doing science” instead of munging around with data formats. It is helpful to have the data “in hand” or easily accessible before starting a project.
I need to move my work to the cloud, can you help with that?
Yes. We have extensive expertise with moving code and data to cloud infrastructures such Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. A reasonable incubator project would be to move an existing workflow to the cloud in a cost-effective manner.
Is there a way to get informal help before submitting an Incubator proposal?
We are currently holding office hours online. Please see our Office Hours page for more information.
I work with sensitive data. Can I still work with Incubator staff?
Yes. Our preference is to publish our code and data to the web, as this promotes transparency and reproducibility. That said, we understand that not all data sets are suitable for publication. We are flexible.