Since the industrial revolution 25-30% of the human-created carbon and 90% of the excess heat in the earth system has been sequestered into the deep ocean. These tracers (like heat, carbon and oxygen) are transported from the surface into the interior in narrow filaments (sub-mesoscale flows), which then merge and mix together at depth to result in a net increase in amount of tracer at depth. To study the dynamics of these structures we need to make observations that span the depth of the water column, and are collected at scales of a few kilometers and hours. This is possible using gliders, which profile the ocean on a zig-zag path (scattered in space and time) as they move up and down through the water column. The goal of this project was to develop tools to better explore these glider datasets. In particular we developed:
A mapping algorithm to map from the scattered space-time observations collected by the glider to a grid, which is easier to visualize and conduct analysis on, and also respects the structural properties of the fields. We used Gaussian Process Regression for this.
A visualization dashboard for the glider, which allows for an interactive analysis of the data such as co-locating multiple variables to get a deeper insight into how observed structures might be generated. We used the Holoviz ecosystem in Python for this.
The greatest challenges of the 21st century are cross-national, including climate change, migration, epidemics, inequality and financial corruption. As a result, it is critical that we better understand the factors that endanger international cooperation. Despite a wealth of research on how the design of international treaties affects treaty commitment and compliance, we only have snapshots of how delegation to third parties, enforcement, and precision differ across treaties. Because most treaties are publicly available text documents, this research area provides a veritable goldmine for the application of cutting edge NLP/machine learning tools—trained on highly curated datasets—to the messy, real world data of most interest and value to addressing pressing social science questions.
We will work to identify the frequency of these different elements of treaty design and legalization with the help of a stratified sample of 2,000 human labeled treaties. We might use these human labels to create a supervised machine learning model that then can predict labels for the universe of 55,000 treaties. Alternately, we may use natural language preprocessing strategies to handle the idiosyncrasies of these data. This project will help future treaty negotiators better understand the features of legalization that improve treaty durability and compliance and thus draft treaties that better contribute to cooperative outcomes, and second, detailed data on treaty design will enable other researchers in political science, sociology, economics and international law to research questions and test hypotheses that are currently not possible to explore due to limited data.
An example showing the lightning strokes that are located by WWLLN over the U.S. region. Red stars indicate the WWLLN sensors, blue dots indicate the lightning strokes detected by WWLLN.
Dry thunderstorms (DT) are convective storms that generate lightning flashes without significant rainfall at the ground. The frequent occurrence of DTs has long been an important safety concern in the western United States due to its connection to wildfire events. An accurate forecast for the dry thunderstorms is therefore critically important, but remains a difficult task because the corresponding physical mechanisms are not yet fully understood. In addition, traditional lightning parameterizations methods are often based on simplified physical intuition and are limited by a small number of free parameters. Machine learning (ML) techniques enable a more ambitious approach to develop parameterizations using data-driven approach and are therefore more flexible than traditional lightning parameterizations. By utilizing the recently developed lightning observations dataset from World-Wide Lightning Location Network (WWLLN) and various atmospheric observations/reanalysis products from ERA5 and TRMM, this study aims to improve the dry thunderstorms forecast skill in the western United States. Several ML methods are tested, including the random forest model and neural network model, where the models are applied to predictions of both binary classification of the occurrence of thunderstorms and the total lightning stroke. By using 10 years of data, our results show that, even with the same input variables, the ML-based lightning parameterization methods are able to outperform an empirical lightning parameterization method in terms of capturing the spatial/temporal variability of both normal and dry lightning.
Maps that delineate and classify forest conditions remain indispensable prerequisites for forest stewardship planning. By developing new reproducible and open-source methods to automate forest mapping, our effort is designed to facilitate conservation and management planning among the 40,000+ non-industrial forest landowners in Oregon and Washington who control over 3 million acres of land. Social science research suggests more than 70% of these owners have strong stewardship attitudes, but are not (yet) engaged in any conservation or management activities. Developing a written stewardship plan is a critical bottleneck for many of these landowners before they can adopt new practices or access state and federal cost-share and incentive programs for implementing new conservation practices.
Through public records requests, forest stand boundaries hand-drawn by state and federal foresters across several million acres covering the Pacific Northwest’s diverse ecoregions have been gathered. These data provide the targets we will use to train modern computer vision models to delineate and classify forest conditions from publicly-available aerial and satellite imagery. The maps we generate will be served to landowners and forest managers across Oregon and Washington through an open-source web app designed to auto-populate Forest Management Plans following widely-used state and federal templates.
Over the Winter Quarter in the Incubator, we organized and formatted a 2TB dataset to allow easy loading and model training and set up a framework for model iteration and comparison using Tensorboard and Neptune. With the support of an AI for Earth Grant from Microsoft, we spun up a virtual machine on Azure and began training convolutional neural networks created with PyTorch to segment land cover types. We were thrilled to see the model learning to generalize and detecting streams and roads that are not present in the human-drawn annotations but which are apparent from aerial imagery and terrain models.
Climate Adaptation for Future Maize – Novel Plant Traits and New Management
Project Lead: Jennifer Hsiao, Department of Biology
Over the next three decades rising population and changing dietary preferences are expected to increase food demand by 25–75%. At the same time climate is also changing — with potentially drastic impacts on food production. Breeding for new crop characteristics and adjusting management practices have the potential to mitigate yield loss due to a changing climate. However, identifying optimum plant traits and management options for different growing regions through traditional breeding practices and agronomic field trials is time and resource intensive. Mechanistic crop simulation models can serve as powerful tools to help synthesize cropping information, set breeding targets, and develop adaptation strategies to sustain food production
In this project, we use a mechanistic crop simulation model to explore how different crop traits and management options affect maize growth and yield, with the hope to identify ideal trait and management combinations that maximize yield and minimize risk for different agro-climate regions in the US.
We identified various sites across the US maize growing region that had several years of hourly growing season climate information available as environmental drivers for the model (Fig. 1). We then set up an ensemble simulation for each simulated site-year that perturbed physiological, phenological, and morphological aspects of the modeled maize plant and recorded the final yield and various growth processes throughout the growing season. We identified key plant traits that were important for high yield, and observed how the importance of traits differed amongst growing regions.
This quarter-long collaboration yielded four main deliverables:
We overhauled and modularized much of the original project code to set up a well-documented and tested data-model pipeline, which allowed us to easily pre-process input data required for model runs and build reproducible workflows for different simulation experiments;
We developed a SQL-relational database that stores all ensemble model outputs, which allowed for easy query, analysis, and visualization of large amount of model output that spans time, space, and model parameter perturbations;
We incorporated a sensitivity analysis framework to identify key model parameters (plant traits) that contribute to high yield either through calculating the partial correlation coefficient or by performing the Fourier amplitude sensitivity analysis;
Finally, we identified plant trait combinations that led to high yield and low yield volatility, and showed plant trait combinations that performed best across different maize growing regions within the US. We found that plant traits linked to crop phenology and development had the greatest impact on contributing to high yield under current climate conditions. We also identified some regional differenced in best performing cultivars between cooler northern regions vs. warmer southern regions.
We are currently setting up new simulations that further perturb identified key plant traits together with management practices such as planting date and planting density to identify ideal plant trait and management combination, and to investigate how these ideal combinations could shift in different growing regions under a changing climate. We hope this work will shed light on region-specific adaptation strategies for US maize facing a changing climate.
Alpine wildflowers are integral part of montane ecosystems; they provide a wide variety of ecosystem services like pollination, and nutrient recycling. Numerous studies have found that these wildflower species are sensitive to climate warming as their flowering phenology (development stage) is strongly related to snowmelt. To understand the effects of climate change on these vulnerable wildflowers, records of various stages of development are required; MeadoWatch which is one of the citizen science initiatives run by Janneke Hille Ris Lambers (JHRL) lab at UW have spearheaded the effort of documenting stages for the past 8 years. Volunteers visit the sites along two popular trails on the south and east side of Mt. Rainier from bud break to post flowering. The program has so far been successful in raising awareness of climate change on wildflowers and being a natural history conduit to staff at Mt Rainier National Park.
At the beginning of 2020, a related initiative was started whereby field images of meadow flowers were captured alongside hyperspectral imagery from hoisted drones. The goal was twofold; first to improve remotely sensed phenology detection of these meadows, and secondly to complement citizen science observations by capturing finer spectral signatures of meadow flowers such that it can be then cross-evaluated with imagery from satellite providers (e.g., Planet, Sentinel-2 and Landsat 8). However, to understand and derive field observed spectral signatures, there is a need to delineate and demarcate flowers from heterogeneous backgrounds (e.g., trees, leaves, soil, rocks, etc.).
We applied Convolutional Neural Nets (CNN) to delineate meadow flowers from complex backgrounds like rocks, soil and leaves. Two architectures Mask R-CNN and YOLO were used. The results were promising; the algorithms performed well in detecting unique number of meadow species, but differed in picking all the instances. Mask R-CNN was the subsequent choice because it was able to detect relatively more instances given that it satisfactorily got uniqueness of species correct.
The results pave way for next steps relating to wildflower occurrences/co-occurrences spatiotemporally.
As public land use increases, accurate visitation numbers are paramount to managers and researchers interested in mitigating and understanding anthropogenic effects. Alpine water quality, as of late, has been under exceptionally high pressure because human waste mitigation is not keeping up with increases in backcountry use. Data concerning backcountry visitation is sparse, and attempts to model it accurately in National Parks is an essential next step in quantifying use. This incubator project is testing approaches for using publicly available, geotagged, social-media data, and other variables to predict visitation to sites in Mount Rainier National Park (MORA). Collaborating with MORA officials, this project delineated park regions that had useful existing trail-count data, prioritizing those locations that are isolated and remote with only one access route. Social media posts that fell within these designated spatial zones were combined with the other predictive factors like precipitation, institutional closures, and week-of-year to estimate visitation, based on relationships of variables with on-site counts of hikers. This model of visitation tied to my work quantifying human enteric waste in alpine waterways may offer other National Parks and public lands methods to evaluate potential impacts of their backcountry use.
Study sites in MORA were selected on two primary conditions: (1) the trail was monitored with an infrared pedestrian counter that captured all traffic, essentially out-and-back trails, and (2) data were collected within the last six years (2015-2020). Polygons were drawn to reflect the areas that would capture all visitors logged by infrared pedestrian counters. These polygons then provided the boundaries for social media posts associated with counter data. The on-site counts and social media posts for each site within MORA were then used to test the ability of a visitation model, parameterized in Mount Baker-Snoqualmie National Forest (MBS) (Wood, et al. 2020), to predict visitation at similar sites within the National Park. Model 1 was created using MBS data and tested using only MORA data onsite counts. Model 2 was built using MBS data dn ⅓ of the available data from MORA and tested with the other ⅔ of MORA count data (Table 1). Model 1 had an average Pearson’s of 0.58 and an R-squared of 0.34 and model 2 had a Pearson’s of 0.64 and an R-squared of 0.41 (n=1000). The next steps to help explain error will be the inclusion of an indicator variable to capture unknown categorical differences in between MBS and MORA and the addition of random effects creating a mixed effects model. For the time being there is no difference between the models, but there is room for improvement. The hope is to further develop this model into a tool capable of explaining enough error to complement my work assessing the effects of human interactions with aquatic environments in remote locations.
Table 1: Model 1 was created using MBS data and tested using only MORA data onsite counts. Model 2 was built using MBS data dn ⅓ of the available data from MORA and tested with the other ⅔ of MORA count data.