By Valentina Staneva, Oceanhackweek co-organizer
The recent publication of a Proceedings of the National Academy of Sciences article describing hackweeks as a model for data science education and collaboration was very timely: it came out just as Oceanhackweek, a new hackweek, commenced at the University of Washington. Oceanhackweek is a week-long workshop (August 20 – 24) in which over 50 participants from three continents gathered to learn state-of-the-art data science tools and practices, and collaboratively work on projects exploring open oceanographic data sets.
This was not the typical summer school crowd: a rare mix of students, postdocs, professors, educators, instrument engineers, software engineers, data managers and analysts sat down together to master new technologies and address open problems. The challenges: networks of ocean observing systems have amassed terabytes of heterogeneous data from diverse sensors, which have yet to be analyzed. Two of those systems are the Ocean Observatories Initiative (OOI) and the Integrated Ocean Observing System (IOOS). Participants were introduced to the sheer range of observations which are being collected (chemical, seismic, sonar, hydrophone, video, etc.) and learned about Python libraries, which allow users to efficiently retrieve data from them.
Apart from data access lectures, the hackweek included tutorials on different data science topics (data mining, visualization, cloud computing, etc.), as well as tools for collaboration and sharing such as Jupyter notebooks, virtual environments, and GitHub. Data managers were present to answer questions and provide context, and also to obtain valuable feedback on the data systems and how they can be improved to facilitate scientific research.
The newly-learned skills were quickly put to use during afternoon project work. As problems in oceanography are by nature interdisciplinary, project teams formed around addressing common needs for the integrated study of the ocean. Projects explored interactive visualizations and mapping of large concurrent data streams, discovery and interoperability of data sets and sensors, standardization of data and metadata formats, automatic data quality assurance, correlations across variables, model validations and predictions using machine learning methods.
The diversity of the participants was reflected by the diversity of the organizing team: a joint effort by members of the UW Applied Physics Lab, the eScience Institute, UW-IT, the School of Oceanography, and Rutgers University OOI Cyberinfrastructure Team. The team included Anthony Arendt, Rob Fatland, Deborah Kelley, Friedrich Knuth, Wu-Jung Lee, Aaron Marburg, Rachael Murray, Don Setiawan, Valentina Staneva, and Amanda Tan.
Early this year they teamed up and organized a smaller three-day Cabled Array Hackweek with predominantly local participants focused on mining data from the OOI cabled instruments. In fact, some tutorials and projects of Oceanhackweek were built on top of what was developed during that earlier workshop. With the generous financial and logistics support of the Consortium of Ocean Leadership, Oceanhackweek could accommodate participants from across all U.S. coasts and also several international researchers. All tutorials were recorded and shared online (videos, materials) to provide access to oceanographers who could not attend, and to build a culture around collaborative development and sharing of educational data science resources within the oceanography field.
Oceanhackers seem to have positively embraced the open and reproducible research practices taught at the event (read some of their reflections here). As they were catching flights to go home, (to hopefully share their experiences in their home labs) more code commits were popping up on their repositories, and we hope that the pulse stays strong until the next hacking event when more oceanographers join the community.
You can learn more about this event by reading the UW IT article “Hacking the ocean’s mysteries.”