For an overview of the Incubator Program click here.

Deer Fear: Using Accelerometers and Video Camera Collars to Understand if Wolves Change Deer Behavior

Project Lead: Apryle Craig, UW Department of Environmental & Forest Sciences PhD Candidate

eScience Liaison: Valentina Staneva

Deer were outfitted with animal-borne video camera collars and accelerometers, which measured acceleration in three dimensions: sway, surge, and heave. Plotting the acceleration associated with a 10-second video clip shows a distinct pattern for different behaviors observed in the video.

Animal behavior can provide insight into underlying processes that drive population and ecosystem dynamics. Accelerometers are small, inexpensive biologgers that can be used to identify animal behaviors remotely. Tri-axial accelerometers measure an animal’s acceleration in each of the three dimensions, frequently recording 10-100 measurements per second. These fine-scale data provide an opportunity to study nuanced behaviors, but have historically posed challenges for storage and analysis. However, animal behavior researchers have been slow to adopt accelerometers, perhaps owing to the rigorous calibration required to infer behavior from acceleration data. Calibration involves time-synchronizing behavioral observations with their associated accelerometer readings, which often necessitates the use of captive animals, surrogate species, or field observations on instrumented individuals. Alternatively, animal-borne video cameras may be used to directly calibrate or validate accelerometers. My goal is to use video from animal-borne cameras to assess the capacity of collar-mounted tri-axial accelerometers and machine learning to accurately classify foraging, vigilance, resting and traveling behavioral states in free-ranging deer. Deer were collared in areas of Washington that were recolonized by wolves and areas without wolves. I hope to use the resulting behavioral classifications to determine whether wolf recolonization is changing deer behavior.

Systems level analysis of metabolic pathways across a marine oxygen deficient zone

Project Lead: Gabrielle Rocap, UW School of Oceanography Professor

eScience Liaison: Bryna Hazelton

Marine Oxygen Deficient Zones (ODZs) are naturally-occurring mid-layer oxygen poor regions of the ocean, sandwiched between oxygenated surface and deep layers. In the absence of oxygen, microorganisms in ODZs use a variety of other elements as terminal electron acceptors, most notably oxidized forms of nitrogen, reducing the amount of bio-available nitrogen in the global marine system through the production of N2O and N2 gas.  These elemental transformations mean that marine ODZs have an outsized contribution to global biogeochemical cycling relative to the volume of ocean they occupy. As ODZs are expanding as the ocean warms, understanding the metabolic potential of the microbial communities within them is key to predicting global elemental cycles. The goal of this project is to use existing metagenomic data from ODZ microbial communities to quantify the metabolic pathways utilized by microorganisms in differently oxygenated water layers. We are using a set of 14 metagenomic libraries from different depths within the ODZ water column representing different oxygen levels (oxic, hypoxic, anoxic etc..) that have been assembled both individually and together. We will use the frequency of genes in microbial populations in each water sample to identify genetic signatures of different water regimes, with a particular focus on genes encoding enzymes mapped in the Kyoto Encyclopedia of Genes and Genomes (KEGG).

Predicting a drought with a flood of data: Evaluating the utility of data-driven approaches to seasonal hydrologic forecasts

Project Lead: Oriana Chegwidden, UW Civil & Environmental Engineering Department PhD Candidate and Staff Scientist

eScience Liaison: Nicoleta Cristea  

Climate change is likely to exacerbate droughts in the future, compromising water availability around the world. Those changes in water availability may not be uniform across the land surface, with changes in precipitation, snowpack, and increased losses due to evapotranspiration. The resulting combined changes to surface water availability are an active area of research. These potential changes are of global significance, particularly in transboundary river basins. Given that earth systems and river basins are agnostic of political boundaries, the potential impacts of changes in water availability, particularly when in a river basin that straddles a political boundary, are significant. In this project we evaluate an ensemble of newly released global climate model (GCM) simulations from the Coupled Model Intercomparison Project Phase 6 (CMIP6), investigating the global impact of climate change on surface water availability. We evaluate these projected changes across river basins, evaluating the extent to which river basins respond uniformly, or whether transboundary river basins will experience greater inequity in water availability. We perform the analysis on the Pangeo platform, using CMIP6 data housed on Google Cloud. We validate the results against ERA5, a global reanalysis product which serves as a gridded observational dataset available at similar resolutions and spatial extents appropriate for comparison with GCM outputs. For example, the mean annual runoff from this dataset for the period 1985-2014 is shown in the figure at right. Ultimately, we provide an analysis of changes in water availability in transboundary river basins. This provides a global study of projected climate change impacts on international water security.

British Justifications for Internment without Trial: NLP Approaches to Analyzing Government Archives

Project Lead: Sarah Dreier, UW Department of Political Science and Paul G. Allen School of Computer Science Engineering Postdoctoral Fellow

eScience Liaison: Jose Hernandez

Our text corpus comprises digitized archive image files retrieved from the UK National Archives. Left: A top secret communique from the UK Ministry of Defense to the Prime Minister considering the use of internment (archive file PREM 15 477). Right: Confidential communique considering whether the UK’s internment policies violate European human rights law (Archive file PREM 15 485).

How do liberal democracies justify policies that violate the rights of targeted citizens? When facing real or perceived national security threats, democratic states routinely frame certain citizens as “enemies of the state” and subsequently undermine those citizens’ freedom and liberties. This Incubator project uses natural language processing (NLP) techniques on digitized archive documents to identify and model how United Kingdom government officials internally justified their decisions to intern un-convicted Irish Catholics without trial during its “Troubles with Northern Ireland.” This project uses three NLP approaches—dictionary methods, word vectors, and adaptions of pre-trained models—to examine if/how government justifications can be identified in text. Each approach is based on, validated by, and/or trained on hand-coded annotation and classification of all justifications in the corpus (the “ground truth”), which was executed prior to the start of this project. In doing so, this project seeks to advance knowledge about government human rights violations and to explore the use of NLP on rich, nuanced, and “messy” archive text. More broadly, this project models the promise of combining archive text, qualitative coding, and computational techniques in social science. This project is funded by NSF Award #1823547; Principal Investigators: Emily Gade, Noah Smith, and Michael McCann.


Automated monitoring and analysis of slow earthquake activity

Project Lead: Ariane Ducellier, UW Department of Earth & Space Sciences PhD Candidate

eScience Liaison: Scott Henderson  

Number and location of low-frequency earthquakes recorded on April 13th 2008 in northern California.

Low-frequency earthquakes (LFEs) are small magnitude earthquakes, with typical magnitude less than 2,and reduced amplitudes at frequencies greater than 10 Hz relative to ordinary small earthquakes. Their occurrence is often associated with tectonic tremor and slow slip events along the plate boundary in subduction zones and occasionally transform
fault zones. They are usually grouped into families of events, with all the earthquakes of a given family originating from the same small patch on the plate interface, and recurring more or less episodically in a bursty manner. Currently, many research papers analyze seismic data for a finite period of time, and produce a catalog of low-frequency earthquakes for this given period of time. However, there is little continuous monitoring of these phenomena.
We are currently using data from seismic stations in northern California to detect low-frequency earthquakes and produce a catalog during the period 2007-2019. However, the seismic stations that we are using are still installed and recording new data every day. Thus, we want to develop an application that will carry out the same analysis (we have been conducting offline so far) now automatically and  continuously on the future data to be recorded during the year 2020 and after. Therefore, an increase of low-frequency earthquake activity will be automatically detected and reported as soon as it has started.

LFEs detected in the last two months with the new application for an LFE family located in northern California.



We have created a Python package with the Python tool poetry and made it available to the public on GitHub. On GitHub, we have created a workflow that every day launches the code source to download the seismic data from three days ago, analyze the data and find the low-frequency earthquakes. The corresponding catalog for this day is then stored in a csv file, which is then uploaded on Google Drive. The last step we are currently developing is to download all the csv files that have been stored on Google Drive, and use the data to plot a figure of the low-frequency earthquake catalog.


Developing a relational database for acoustic detections and locations of baleen whales in the Northeast Pacific Ocean

Project Lead: Rose Hilmo, UW School of Oceanography PhD Candidate

eScience Liaison: Joseph Hellerstein

The health and recovery of whale populations is a major concern in ocean ecosystems. This project is about using data science to improve the monitoring of whale populations, an ongoing area of research in ocean ecology.

Lower) Spectrogram showing 20 minutes of repeating blue whale B-calls stereotyped by a 10 second downsweep centered on 15 Hz. Upper) Plot showing output of our B-call spectrogram cross-correlation detector (blue) and peak detections (orange x’s) of calls.

Our focus is acoustic monitoring, a very effective tool for monitoring the presence and behavior of whales in a region over extended time periods. Ocean bottom seismometers (OBSs) that are used to record earthquakes on the seafloor can also be used to detect blue and fin whale calls. We take advantage of a large 4-year OBS deployment spanning the coast of the Pacific northwest to investigate spatial and temporal trends in fin and blue whale calling, data that provide an unprecedented scale for whale monitoring. Our main research question is: How does whale call activity vary in time (e.g., seasonally and annually) and space in the Northwest Pacific? Additionally, how does call variability relate to other parameters such as environmental conditions and anthropogenic noise such as ship noise and seismic surveys? This information will provide considerable insight into whale populations and ultimately into ocean ecology.

Over the past decade, our lab group has implemented many methods of blue and fin whale acoustic detection and location. This has generated large volumes of data on temporal and spatial calling patterns of these species in the Northeast Pacific. Our main goal of the data science incubator is to build and publish a SQL relational database of our compiled whale data. This will not only improve our own ability to work with our current data and easily integrate new data but will also allow others in our community to utilize our framework and incorporate their own data. Additionally, we will re-implement our whale detection codes (currently in MATLAB) in Python. These codes will be open source (on github), make use of the relational database, and incorporate software engineering best practices. It is our hope other researchers will apply our methods to study fin and blue whales using large OBS deployments in other key ecological regions such as Alaska, Hawaii, and Bransfield Strait (Antarctica).


This project yielded two main deliverables: Well documented python code for detection of whale calls in an accessible github repository, and the framework of a SQL relational database for storing whale call and location data.

The python code package we developed during the incubator detects blue and fin whale calls recorded on ocean bottom seismometers. However, the code is flexible and can be used to detect calls on other instruments such as pressure sensors and hydrophones as well. We use a spectrogram cross correlation method where a kernel image matching the spectral dimensions of call is constructed and then cross-correlated with a spectrogram of timeseries data from an instrument. Areas where the kernel and spectrogram match result in peaks in the detection score which are then recorded as calls (figure 1). Call metrics of interest to whale ecologists such as signal-to-noise ratio, call duration, and times are stored in a pandas dataframe and then written to our database.

A central part of this project is the relational database. The database is structured using an information model that relates stations, channels, detections, and calls. We developed a python implementation of the database. This structure of database implementation was essential for two reasons. First, this structure streamlines data storage and use. Referencing and filtering associated information from different instruments, calls, and whale locations for analysis is simple using the relational database tables.  Additionally, the open source nature of all tools used to build and access the database increases accessibility for others who want to use this data in their own research. As of the end of the incubator, we have filled the database only with test detections and locations on small portions of data. This will be filled more completely with 4 years of detection and location data from arrays of ocean bottom seismometers off the coast of the Pacific Northwest as we apply our methods large-scale (Figure 2b).

Figure 2: a) Histogram of monthly blue whale B-call detections on a subset of ocean bottom seismometers for 2011-2012 calling season. b) Map showing ocean bottom seismometers deployed off the Pacific Northwest between 2011-2015 with subset stations highlighted.

So far, we have only run our blue whale detector on one year of ocean bottom seismometer data from the large Cascadia Initiative array as a proof of concept. We did this to test the quality of our detector and consult with whale experts about any additional useful call metrics we should add to our database. We will improve our detector expand the database to include additional metrics such as frequency measurements and background noise levels before running code on the full set of data. 

Figure 2a shows a monthly histogram of total blue calls from our test dataset detected on a subset of 5 stations of interest. Blue whale call presence on these stations shows a strong seasonality, present only from late fall through early spring. Call counts vary by location. Calls on stations in shallow water near the coast (FN14A and M08A) peak in November, earlier in the season than the other stations in deep water which peak in December-January. Much deeper analysis of spatial and temporal trends in blue whale calling will be possible once our method is run on the full set of data.



Data analytics for demixing and decoding patterns of population neural activity underlying addiction behavior

Project Lead: Charles Zhou, Anesthesiology & Pain Medicine Staff Scientist 

eScience Liaison: Ariel Rokem