For an overview of the Incubator Program click here.

Developing a Workflow for Managing Large Hydrologic Spatial Datasets to Assist Water Resources Management and Research

Figure 1. a) – l) Spatial patterns of snow distribution in 2014

Figure 1. a) – l) Spatial patterns of snow distribution in 2014

Project Lead: Nicoleta Cristea, Civil and Environmental Engineering, University of Washington
Project Collaborators: Jessica Lundquist,  Ryan Currier, Karl Lapo
eScience Liaisons: Anthony Arendt, Rob Fatland

Large, spatially distributed datasets have increasingly become more abundant, but there is currently no workflow that efficiently manages, analyzes and visualizes these datasets, ultimately dampening their usability and assistance in water resource management/research. Within the incubator, we envision creating a workflow based on the existing weather model generated meteorology files (1TB spatio-temporal) and LiDAR-derived snow depth spatial datasets, 1-9 GB (Figure 1) to be applicable to other existing or incoming data. The workflow will help integrate high-resolution spatio-temporal datasets with hydrologic modeling to improve water resources management.

The main goal of this project during the incubator is to explore different methods for computing and cloud storage that may increase data processing efficiency. Besides performance issues around minimizing processing time in the context of dataset size, practical issues of migration of software licenses are also being explored. The group has been working on determining how to partition data processing capabilities between the cloud and local machines.  In order to evaluate this, the team is testing two cases – cloud computing with new code (Python), which does not require a license, and using existing code with a license required (Matlab). To use the Xarray Python package, it was necessary to convert the data from the original format to a contiguous time series for the region of interest. This code is in the process of being tested in Azure.
The next steps for this project are to finish testing cloud computing methods for the Python code, and to use existing Matlab code to downscale coarse resolution fractional snow covered area datasets to high resolution binary snow data (presence or absence) for further testing. Additionally, they will determine what sort of data are good candidates for cloud computing, and at what point the time and cost of cloud computing becomes a better option than local computing resources.


Methods for Characterizing Human Centromeres

An example representation of similarities of centromeric repeat units on a single-molecule long read.

Project Lead: Siva Kasinathan, UW School of Medicine eScience Liaisons: Andrew Fiore-Gartland, Bryna Hazelton

Despite an explosion in DNA sequencing technology, many genome projects, including the Human Genome Project, remain fundamentally unfinished. Gaps in genome assemblies occur in regions composed of repeated sequences. Human centromeres, which are loci that ensure proper partitioning of genetic material at each cell division, are one such class of unassembled sequence and account for an estimated 60 million base pairs of a genome that is 3 billion base pairs in length. Centromere dysfunction may be associated with cancers and developmental disorders such as Down syndrome; however, the inability to exactingly interrogate centromere sequence has impeded a clear understanding of centromere biology in human health and disease.

Gene sequencing is carried out by ‘reading’ chunks of the genome at a time, and then piecing those chunks back together, much like putting together the pieces of a jigsaw puzzle. Unfortunately, regions of the genome that contain a large number of repeated patterns are particularly difficult to reassemble. This incubator project is focused on developing methods for trying to reassemble these parts of the genome. In the first half of this incubator period we developed a ‘fake’ genome which would allow us to test which methods have the potential to be successful and examined whether piecing together sequences based on cross-correlations patterns is likely to be effective.


Target Detection for Advanced Environmental Monitoring of Marine Renewable Energy

Figure 1: The Adaptable Monitoring Package and deployment system.

Project Lead: Emma Cotter, Mechanical Engineering, University of Washington
Project Collaborators: Brian Polagye, Paul Murphy eScience Liaison: Bernease Herman

It is necessary to reduce the uncertainty surrounding the environmental effects of marine renewable energy for the industry to advance. The Adaptable Monitoring Package (AMP) is an instrumentation platform for that combines sonar, cameras, and hydrophones in a centrally controlled package. The AMP must be able to detect infrequent, but severe events, as well as frequent events that, when considered cumulatively, may be biologically significant. Detection of rare events requires continuous monitoring, which generates over 250 GB of data per hour, presenting a challenge for both data storage and processing. During the incubator, we will develop real-time target detection algorithms to control data acquisition, so that only relevant data is stored.


Improved Stimulation Protocols for Sight Restoration Technologies

An example of the sort of coding model that might be able to improve prosthetic patient vision.

Project Leads: Ione Fine, Professor of Psychology, University of Washington
Geoffrey M. Boynton, Professor of Psychology, University of Washington
eScience Liaison: Ariel Rokem

Our goal is to develop a neurophysiologically inspired algorithm for improved electrical stimulation protocols in patients implanted with electronic prostheses. By 2020 roughly 200 million people will suffer from retinal diseases. Electronic prostheses, that stimulate remaining retinal cells using electrical current, analogous to a cochlear implant, are currently being implanted in patients and show promise in restoring some vision. These prostheses require a way to translate the visual input into an electrical stimulation protocol, and the current methods of translation are known to be inadequate. Our goal is to develop better coding schemes and see whether they have the potential to improve the vision produced by these devices.

A major challenge with these prostheses is developing electrical stimulation protocols that properly convey a visual percept. Previous work involved the development of a ‘forward’ model that predicts a perceived image given a set of electrical stimuli; however, a ‘reverse’ model that predicts the appropriate sequence of electrical stimuli given a desired percept is required for informing improvements of visual prostheses. Data and insights from this modeling may also be useful to regulators. As a part of the eScience Incubator, the ‘forward’ model of a retina with implanted electrodes has been implemented in Python. Work on speeding up convolution steps (challenging due to high sampling rates) will facilitate development and implementation of the ‘reverse’ model.


AralDIF: A Cloud-based Dynamic Information Framework for the Aral Sea Basin

Example of a DIF flowchart

Project Lead: Amanda Tan, Department of Oceanography, University of Washington eScience Liaisons: Rob Fatland, Anthony Arendt

A dynamic information framework (DIF) is a decision support structure centered on earth system science models and data resources used to address policy challenges in water resources caused by population growth, socio-economic development and climate change, particularly in developing nations. This decision support tool is particularly useful in developing nations because such a framework encourages open data sources, transparency and also, by putting such a framework in the cloud, it eliminates the need for standalone infrastructure.

The AralDIF project aims to inform policy challenges in water resources in the Aral Sea Basin; bridging the information gap between decision makers and scientists. The goal of this project is to better inform policy challenges in the area by providing a scalable and flexible framework for data storage and tools for data visualization and analysis.

The AralDIF tool integrates hydrologic model and climate data, correlating and sub-setting the large global datasets to the region of interest using the Python development environment (using Pandas and Xarray modules). The cropped datasets are currently stored using Azure Blob, although other cloud based services are also being considered. The data is served through an API developed using the Django Python Web framework, allowing the user to download data for certain locations within the basin during specific timeframes. Currently the user has the choice to download the data as either comma-separated text files, raw json output or hydrographs. These tools allow the user to navigate and interpret the large datasets to answer specific questions that can inform policy decisions.

The next steps of the project are to tie the hydrologic data to geospatial information using web based data visualization tools such as the Bing Maps platform, harness existing online datasets (e.g. CUAHSI HIS) and provide a demonstration model Client in the browser.


Damage Speaks: Acoustical Monitoring Framework for Structures Subjected to Earthquakes

Scaled bridge tested at the University of Nevada, Reno.

Project Lead: Travis Thonstad, Civil & Environmental Engineering, University of Washington
Project Collaborators: Marc O. Eberhard, John F. Stanton, Civil and Environmental Engineering, University of Washington
Islam Mantawy, David H. Sanders, Civil and Environmental Engineering, University of Nevada, Reno
eScience Liaison: Valentina Staneva

Currently, the assessment of the integrity and safety of structures subjected to earthquakes relies on a combination of visual inspection and engineering judgement. This process is time consuming and can miss critical damage that is not visible. New approaches are needed to rapidly and reliably inform decisions about closures and restrictions in service for bridges and buildings after seismic events.

The project aims to use recorded audio signals to detect structural damage during earthquakes; specifically, damage heard during shake table testing of a quarter-scale concrete bridge. The challenges that have been encountered include: the audio signals are not synchronized with one another, multiple bar fractures tend to occur at nearly the same instant and the audio signals include high levels of noise from machinery. Several different methods have been employed to separate out the noise from the sounds of damage, including principal component analysis (PCA), independent component analysis (ICA), median filtering, continuous wavelet transform (CWT) and feature-based clustering using spectral and statistical measures of the signal. Thus far, decomposition (PCA and ICA) and CWT has proven ineffective at isolating damage signals.

Future work will include using maximums in structural response as peak predictors, incorporating prior information from fatigue models andestimating damage locations within the structure using time of arrival differences between microphone locations. The methodology will be tested on other experiments available through the Network for Earthquake Engineering Simulation (NEES) data repository.