For an overview of the Incubator Program click here.
Deer Fear: Using Accelerometers and Video Camera Collars to Understand if Wolves Change Deer Behavior
Project Lead: Apryl Craig, UW Department of Environmental & Forest Sciences PhD Candidate
eScience Liaison: Valentina Staneva
Animal behavior can provide insight into underlying processes that drive population and ecosystem dynamics. Accelerometers are small, inexpensive biologgers that can be used to identify animal behaviors remotely. Tri-axial accelerometers measure an animal’s acceleration in each of the three dimensions, frequently recording 10-100 measurements per second. These fine-scale data provide an opportunity to study nuanced behaviors, but have historically posed challenges for storage and analysis. However, animal behavior researchers have been slow to adopt accelerometers, perhaps owing to the rigorous calibration required to infer behavior from acceleration data. Calibration involves time-synchronizing behavioral observations with their associated accelerometer readings, which often necessitates the use of captive animals, surrogate species, or field observations on instrumented individuals. Alternatively, animal-borne video cameras may be used to directly calibrate or validate accelerometers. My goal is to use video from animal-borne cameras to assess the capacity of collar-mounted tri-axial accelerometers and machine learning to accurately classify foraging, vigilance, resting and traveling behavioral states in free-ranging deer. Deer were collared in areas of Washington that were recolonized by wolves and areas without wolves. I hope to use the resulting behavioral classifications to determine whether wolf recolonization is changing deer behavior.
Systems level analysis of metabolic pathways across a marine oxygen deficient zone
Project Lead: Gabrielle Rocap, UW School of Oceanography Professor
eScience Liaison: Bryna Hazelton
Marine Oxygen Deficient Zones (ODZs) are naturally-occurring mid-layer oxygen poor regions of the ocean, sandwiched between oxygenated surface and deep layers. In the absence of oxygen, microorganisms in ODZs use a variety of other elements as terminal electron acceptors, most notably oxidized forms of nitrogen, reducing the amount of bio-available nitrogen in the global marine system through the production of N2O and N2 gas. These elemental transformations mean that marine ODZs have an outsized contribution to global biogeochemical cycling relative to the volume of ocean they occupy. As ODZs are expanding as the ocean warms, understanding the metabolic potential of the microbial communities within them is key to predicting global elemental cycles. The goal of this project is to use existing metagenomic data from ODZ microbial communities to quantify the metabolic pathways utilized by microorganisms in differently oxygenated water layers. We are using a set of 14 metagenomic libraries from different depths within the ODZ water column representing different oxygen levels (oxic, hypoxic, anoxic etc..) that have been assembled both individually and together. We will use the frequency of genes in microbial populations in each water sample to identify genetic signatures of different water regimes, with a particular focus on genes encoding enzymes mapped in the Kyoto Encyclopedia of Genes and Genomes (KEGG).
Predicting a drought with a flood of data: Evaluating the utility of data-driven approaches to seasonal hydrologic forecasts
Project Lead: Oriana Chegwidden, UW Civil & Environmental Engineering Department PhD Candidate and Staff Scientist
eScience Liaison: Nicoleta Cristea
Climate change is likely to exacerbate droughts in the future, compromising water availability around the world. Those changes in water availability may not be uniform across the land surface, with changes in precipitation, snowpack, and increased losses due to evapotranspiration. The resulting combined changes to surface water availability are an active area of research. These potential changes are of global significance, particularly in transboundary river basins. Given that earth systems and river basins are agnostic of political boundaries, the potential impacts of changes in water availability, particularly when in a river basin that straddles a political boundary, are significant. In this project we evaluate an ensemble of newly released global climate model (GCM) simulations from the Coupled Model Intercomparison Project Phase 6 (CMIP6), investigating the global impact of climate change on surface water availability. We evaluate these projected changes across river basins, evaluating the extent to which river basins respond uniformly, or whether transboundary river basins will experience greater inequity in water availability. We perform the analysis on the Pangeo platform, using CMIP6 data housed on Google Cloud. We validate the results against ERA5, a global reanalysis product which serves as a gridded observational dataset available at similar resolutions and spatial extents appropriate for comparison with GCM outputs. For example, the mean annual runoff from this dataset for the period 1985-2014 is shown in the figure at right. Ultimately, we provide an analysis of changes in water availability in transboundary river basins. This provides a global study of projected climate change impacts on international water security.
British Justifications for Internment without Trial: NLP Approaches to Analyzing Government Archives
Project Lead: Sarah Dreier, UW Department of Political Science and Paul G. Allen School of Computer Science Engineering Postdoctoral Fellow
eScience Liaison: Jose Hernandez
How do liberal democracies justify policies that violate the rights of targeted citizens? When facing real or perceived national security threats, democratic states routinely frame certain citizens as “enemies of the state” and subsequently undermine those citizens’ freedom and liberties. This Incubator project uses natural language processing (NLP) techniques on digitized archive documents to identify and model how United Kingdom government officials internally justified their decisions to intern un-convicted Irish Catholics without trial during its “Troubles with Northern Ireland.” This project uses three NLP approaches—dictionary methods, word vectors, and adaptions of pre-trained models—to examine if/how government justifications can be identified in text. Each approach is based on, validated by, and/or trained on hand-coded annotation and classification of all justifications in the corpus (the “ground truth”), which was executed prior to the start of this project. In doing so, this project seeks to advance knowledge about government human rights violations and to explore the use of NLP on rich, nuanced, and “messy” archive text. More broadly, this project models the promise of combining archive text, qualitative coding, and computational techniques in social science. This project is funded by NSF Award #1823547; Principal Investigators: Emily Gade, Noah Smith, and Michael McCann.
Automated monitoring and analysis of slow earthquake activity
Project Lead: Ariane Ducellier, UW Department of Earth & Space Sciences PhD Candidate
eScience Liaison: Scott Henderson
Number and location of low-frequency earthquakes recorded on April 13th 2008 in northern California.
Low-frequency earthquakes (LFEs) are small magnitude earthquakes, with typical magnitude less than 2,and reduced amplitudes at frequencies greater than 10 Hz relative to ordinary small earthquakes. Their occurrence is often associated with tectonic tremor and slow slip events along the plate boundary in subduction zones and occasionally transform
fault zones. They are usually grouped into families of events, with all the earthquakes of a given family originating from the same small patch on the plate interface, and recurring more or less episodically in a bursty manner. Currently, many research papers analyze seismic data for a finite period of time, and produce a catalog of low-frequency earthquakes for this given period of time. However, there is little continuous monitoring of these phenomena.
We are currently using data from seismic stations in northern California to detect low-frequency earthquakes and produce a catalog during the period 2007-2019. However, the seismic stations that we are using are still installed and recording new data every day. Thus, we want to develop an application that will carry out the same analysis (we have been conducting offline so far) now automatically and continuously on the future data to be recorded during the year 2020 and after. Therefore, an increase of low-frequency earthquake activity will be automatically detected and reported as soon as it has started.
Developing a relational database for acoustic detections and locations of baleen whales in the Northeast Pacific Ocean
Project Lead: Rose Hilmo, UW School of Oceanography PhD Candidate
eScience Liaison: Joseph Hellerstein
The health and recovery of whale populations is a major concern in ocean ecosystems. This project is about using data science to improve the monitoring of whale populations, an ongoing area of research in ocean ecology.
Lower) Spectrogram showing 20 minutes of repeating blue whale B-calls stereotyped by a 10 second downsweep centered on 15 Hz. Upper) Plot showing output of our B-call spectrogram cross-correlation detector (blue) and peak detections (orange x’s) of calls.
Our focus is acoustic monitoring, a very effective tool for monitoring the presence and behavior of whales in a region over extended time periods. Ocean bottom seismometers (OBSs) that are used to record earthquakes on the seafloor can also be used to detect blue and fin whale calls. We take advantage of a large 4-year OBS deployment spanning the coast of the Pacific northwest to investigate spatial and temporal trends in fin and blue whale calling, data that provide an unprecedented scale for whale monitoring. Our main research question is: How does whale call activity vary in time (e.g., seasonally and annually) and space in the Northwest Pacific? Additionally, how does call variability relate to other parameters such as environmental conditions and anthropogenic noise such as ship noise and seismic surveys? This information will provide considerable insight into whale populations and ultimately into ocean ecology.
Over the past decade, our lab group has implemented many methods of blue and fin whale acoustic detection and location. This has generated large volumes of data on temporal and spatial calling patterns of these species in the Northeast Pacific. Our main goal of the data science incubator is to build and publish a SQL relational database of our compiled whale data. This will not only improve our own ability to work with our current data and easily integrate new data but will also allow others in our community to utilize our framework and incorporate their own data. Additionally, we will re-implement our whale detection codes (currently in MATLAB) in Python. These codes will be open source (on github), make use of the relational database, and incorporate software engineering best practices. It is our hope other researchers will apply our methods to study fin and blue whales using large OBS deployments in other key ecological regions such as Alaska, Hawaii, and Bransfield Strait (Antarctica).
Data analytics for demixing and decoding patterns of population neural activity underlying addiction behavior
Project Lead: Charles Zhou, Anesthesiology & Pain Medicine Staff Scientist
eScience Liaison: Ariel Rokem
In 2017, 1.7 million people in the United States reported addiction to opioid pain relievers (Center for behavioral Health Statistics and Quality, 2017) while 47,000 individuals died from opioid overdose (CDC, 2018). Understanding the mechanisms of substance use disorders and developing targeted treatments are monumental challenges due to the facts that the responsible brain regions are situated deep within the brain and possess highly diverse neuron populations and circuitry. To tackle this challenge, laboratories at UW’s NAPE (Neurobiology of Addiction, Pain, and Emotion) center utilize 2-photon calcium imaging to record from hundreds of neurons in animal deep brain structures simultaneously during drug seeking behaviors. Briefly, this method combines high temporal and spatial resolution microscopy with cell-type specific fluorescent neural activity readout to produce videos of brain activity where single neurons can be resolved. As a result, for a given animal subject one can track over a thousand neurons over the course of several days of behavior and drug administration assays; however, sophisticated data analysis techniques to dissect how activity patterns across hundreds of thousands of neurons relate to behavior and addiction remain underdeveloped. The aim of this project is to apply novel statistical and machine learning analysis techniques to large-scale 2-photon calcium imaging data with respect to addiction-related behaviors and assays.The project plan is to first perform dimensionality reduction on the mouse calcium imaging videos using tensor component analysis (Williams AH et al., 2018, Neuron) then to use those data to predict behavioral conditions using a convolutional neural network. Once the neural network is able to discriminate behavioral conditions, I can examine the spatial maps that are learned by the neural network nodes. The overall significance of this project is to gain insight to spatially distributed neural patterns that underlie addiction behaviors, allowing for targeted development of drug addiction therapies.
Calcium imaging data with experimental condition labels will be used to train a convolutional neural network. Latent cell activation patterns will be identified from the model. Panel on the right represents a cartoon sample cell pattern identified by feature extraction.