The University of Washington eScience scientists Nicoleta Cristea, Anthony Arendt, and Scott Henderson, together with UW Earth and Space Science Assistant Professor Marine Denolle and George Mason University Research Assistant Professor Ziheng Sun have been awarded by the NSF to develop an educational program aiming to accelerate the adoption of open-source and machine learning workflows in the geosciences. 

The team will develop a research workforce capable of tackling fundamental geoscience challenges with the widespread adoption of Machine Learning (ML) tools. The GeoScience MAchine Learning Resources and Training (GeoSMART) framework will provide an educational pathway in the foundations of open source scientific ecosystems and progresses through general ML theory, toolkits, and deployment on cloud computing. The project will help develop discipline-specific ML libraries, workflows, and communities of practice that are capable of sustaining future growth of ML cyber training opportunities. These materials can be integrated into university courses to broaden the impact of emerging ML communities even further. The GeoSMART implementation plan will guide participants through fundamental trainings in open source ML toolkits and data science skills. Once they have mastered the basics, participants will explore interactive events such as hackweeks, project-led and peer-to-peer mentoring activities, and incubators.

ML can discover patterns and trends in large amounts of data as it is based on a bottom-up approach in which algorithms learn relationships between input data and output. ML models are also highly efficient and, in some cases, more accurate because of their flexibility to accommodate nonlinearity and/or non-Gaussianity. As geoscience has entered an era in which both in-situ and remote sensing observations are dense and global (Fig. 1), advances in observational techniques have dramatically increased the spatial and temporal resolution and therefore, data quality and volume. ML tools can now assist handling large volumes of observations, modeling, analysis, and forecasting of the environment by increasing the speed and accuracy of computations. 

By building tools using open source and cloud-accessible platforms, and by partnering with colleges and institutions who currently lack computing resources for ML workflows, GeoSMART will democratize access to cybertraining materials and ensure more people can be included in helping to solve geoscience challenges. 

The project will create diverse and inclusive communities of practice that have access to advanced computing tools, where not only the well-funded institutions benefit from the latest technologies. Members of these communities, having worked through the cybertraining offerings, will be empowered to advance major bottlenecks in geoscience related to issues of global water resources and volcanic or seismic hazards.  Through community engagement and interactive workshops, GeoSMART will create strong links between geoscience research priorities and the tools and infrastructure needed to work with large and complex earth system data.