For an overview of the Incubator Program click here.

Applying Machine Learning to the Analysis of the Large-Scale Structure of Turbulence

Project Lead: Owen Williams, Department of Aeronautics and Astronautics
eScience Liaison: Jake VanderPlas

Representation of swirling turbulent boundary layer motions using using Partial Orthogonal Decomposition (POD). Data thumbnails were centered on swirling motions (top). First two POD modes (bottom).

Representation of swirling turbulent boundary layer motions using using Partial Orthogonal Decomposition (POD). Data thumbnails were centered on swirling motions (top). First two POD modes (bottom).

Turbulent flows dominate many flows of engineering interest, regulating mixing, heat transfer and drag on vehicles. While hard to concisely define, turbulence is noisy, stochastic, and contains eddies of a wide range of scales that combine to create a chaotic set of movements that are difficult to decipher. Relatively recently, however, it has been discovered that turbulent flows can be decomposed into sets of coherent structures that can explain many previous statistical observations. While such structures can be identified by eye this identification can be subjective, and new measurement and simulation techniques now produce such significant quantities of data that manual methods are impractical. New methods of automated detection and analysis of turbulent structure are desperately needed to enable more in-depth statistical analysis. Many other important questions about the nature of coherent structures also still remain to be answered, with the full range of structure geometries, their prevalence and the interactions between them poorly understood.

We aim to develop a new tool for automated detection and analysis of turbulent structures through the application of established machine learning (ML) techniques. Leveraging advances in machine learning for the study of turbulent boundary layer structure will standardize the recognition of turbulent structures and allow for significantly greater yield from the largest and most modern computational and experimental datasets, some of which can require a terabyte of storage just to describe a single snapshot of the flow.

 

Cloud-Enabled Tools for the Analysis of Subsea HD Camera Data

Project Lead: Aaron Marburg, Applied Physics Laboratory
eScience Liaisons: Bernease Herman and Valentina Staneva

The Subsea HD Camera (CAMHD) on the Ocean Observatories Initiative (OOI) Cabled Array observes the volcanic vent Mushroom on Axial volcano 300 miles off the Oregon coast. Every three hours it produces a 12 minutes (15GB) video which is archived at Rutgers University. Cloud-based elastic storage and computation allow flexible, low-cost, high-performance processing of these large video files to produce scientific insights into the geological and biological functioning of these unique subsea ecosystems.

Subsea HD Camera (CAMHD) on the Ocean Observatories Initiative (OOI) Cabled Array 300 miles off the Oregon coast observes the volcanic vent Mushroom on Axial volcano 300.

The publicly available data generated by CamHD have enormous scientific potential, with the capacity to support a wide range of geological, biological, hydrological, and oceanographic investigations using image analysis methods. However, the large size of the video archive and the lack of co-located computing infrastructure at the CI constitute a significant barrier to CamHD science. For end users, downloading this immense dataset for local analysis represents a significant burden in terms of time, bandwidth, and data storage costs. To fully realize the potential of the CamHD system for long-time-series investigations using image analysis, a co-located storage and computing solution must be developed.

 

3D Visualization of Prostate Cancer Using Light-Sheet Microscopy

Project Lead: Dr. Nicholas Reder, Pathology
eScience Liaison: Ariel Rokem

Our light-sheet microscope, raw images (up to 1 terabyte of data), and final reconstructed image with comparison to standard H&E pathology.

Our light-sheet microscope, raw images (up to 1 terabyte of data), and final reconstructed image with comparison to standard H&E pathology.

Advances in microscopic imaging enable visualization of heretofore unseen 3D microanatomical features, which have the potential to transform cancer diagnostics. These novel imaging techniques have led to improved diagnostics in kidney biopsies, brain structure, and embryonic development. Light-sheet microscopy and tissue clarification techniques are a particularly intriguing combination because large volumes of cancer tissue can be imaged with high spatial resolution. However, data processing, data management, and visualization of 3D structures has lagged behind the advances in data acquisition. Currently, our data processing steps require 12-24 hours of computing time and are accomplished using fragments of code written in matlab and Miji (matlab + Fiji). In addition, our existing software code is stored on multiple servers and is not well annotated, limiting reproducibility of our work. Finally, the visualization of 3D structures is suboptimal and hinders our ability to provide diagnostic insights. Thus, our current data science limitations have impeded potentially groundbreaking discoveries in cancer microanatomy.

To solve this problem, we will create a software package to optimize the processing, storage, and visualization of 3D microscopic data. Key morphologic features will be extracted from the images and displayed in a simple, easy to navigate format. All code will be well annotated and stored on GitHub to ensure reproducibility.

 

Detecting Small Particles in Low-Contrast Images to Aid in Particle Tracking

Project Lead: Alicia Clark, Mechanical Engineering
eScience Liaisons: Bernease Herman and Valentina Staneva

Side view of the experimental setup showing bubbles in flow under ultrasound excitation. The imaging area is outlined to show where the high-speed images were taken.

Side view of the experimental setup showing bubbles in flow under ultrasound excitation. The imaging area is outlined to show where the high-speed images were taken.

Ultrasound (US) is a safe and non-invasive imaging method commonly used in healthcare and clinical applications due to its high spatial and temporal resolution. However, there are areas of the body where low contrast makes it difficult to obtain high quality images needed for medical diagnosis. This limitation led to the development of microbubbles that can be injected into the circulation to increase contrast between tissue and surrounding vasculature. These microbubbles, with diameters typically between 1 and 10 micrometers, are known as ultrasound contrast agents (UCAs).

UCAs have a potential application in targeted drug delivery because the fluctuating pressure field associated with the ultrasound waves exerts a net force on the microbubbles that can be used to manipulate them inside the human body. This phenomenon, known as the Bjerknes force, needs to be further explored and quantified since it can potentially be used to direct the microbubbles towards a targeted area. The microbubbles could then be used to help image small blood vessels that support the growth of a tumor, and they could potentially be used to suffocate the tumor by expanding in these small vessels (embolism). It is also possible that these bubbles can be used for intracellular gene-delivery. Previous theoretical and experimental work explored the dynamics of UCAs under ultrasound excitation. It showed the Bjerknes force arising from the phase difference between incoming US pressure waves and bubble volume oscillations can be used to manipulate the trajectories of microbubbles. This work has contributed a significant understanding of microbubble behavior in quiescent or uniform flows; however, it has not focused on microbubbles in physiologically realistic flows. Our work explores the behavior of microbubbles in medium sized blood vessels under both uniform and pulsatile flows at a range of physiologically relevant Reynolds and Womersley numbers.

 

National Water Watch: Monitoring Freshwater Vulnerability to Climate Change and Human Activity

Project Lead: Catherine Kuhn, School of Environmental and Forest Sciences 
eScience Liaisons: Amanda Tan and Rob Fatland

Map showing counts for water quality sampling sites returned from the National Water Quality Portal for the mainstem of the Mississippi from 1984 - 2017. Our project will be mining this data to look for long term trends in water quality in areas that might be experiencing dramatic shifts as a result of land use transitions and climate change.

Map showing counts for water quality sampling sites returned from the National Water Quality Portal for the mainstem of the Mississippi from 1984 – 2017. Our project will be mining this data to look for long term trends in water quality in areas that might be experiencing dramatic shifts as a result of land use transitions and climate change.

Rivers, lakes and streams are considered sentinels of environmental change. Deforestation, urbanization, and nutrient runoff are increasingly recognized as drivers of change for freshwaters, yet most research analyzing the impact of these forcings occurs at the watershed scale. While smaller scale studies provide valuable insight into physical processes, few studies describe the vulnerability of inland water quality (WQ) to climate change and anthropogenic activities at larger scales. This type of synthesis knowledge is crucial for informing policy-making, water resources management and conservation, yet is lacking at a national scale. However, advances in cloud-based data analytics has created a new research landscape making possible the rapid analysis of public datasets to monitor changes in surface waters at large spatial and temporal scales.

This project seeks to create a national tool for relating changes in water quality signals to land use and precipitation change, representing a significant step forward for understanding the impact of human activities and climate change on US surface waters. In our data synthesis and visualization system, large datasets will be queried to create simple geovisualizations of historic WQ changes and to establish foundations for distilling broad national patterns related to satellite remote sensing. We hypothesize regions with rapid land use change will also experience shifts in WQ signals.

 

 

Discovering Marine Trophic Interaction Patterns Using Sonar Time Series from Ocean Observatories

Project Lead: Wu-Jung Lee, Applied Physics Laboratory
eScience Liaisons: Bernease Herman and Valentina Staneva

The ocean observatories initiative (OOI) has established a continuous flow of data from echosounders deployed at diverse global locations. By collecting echoes reflected from animals in the water column, these data provide an unprecedented opportunity to study long-term trophic interactions in the marine ecosystems. However, this large, complex data set presents new challenges, since the conventional echo analysis methods are non-adaptive and not effective in such scenarios with limited biological ground truth (e.g., species composition from net samples). Our goals are to construct a suite of methods for discovering temporal (e.g., daily, monthly, seasonal) patterns in the echogram and interpreting the animal composition of the segmented aggregations. The long-range goal of this project is to enable information extraction for large-scale, long-term monitoring of the marine ecosystems.

The ocean observatories initiative (OOI) has established a continuous flow of data from echosounders deployed at diverse global locations. By collecting echoes reflected from animals in the water column, these data provide an unprecedented opportunity to study long-term trophic interactions in the marine ecosystems.

The temporal and spatial occurrence of predator-prey interactions, and the associated biomass change across the food chain, are of central importance in the marine ecosystem. Compared to net-based sampling methods, sonar systems (echosounders) offer promising potentials for quantifying such interactions, by delivering synoptic observation of the whole water column at each ensonification (echogram; Fig. 1). The Ocean Observatories Initiative (OOI) recently deployed numerous such systems, with an ambitious goal of cross-trophic observation at a significantly longer time scale than previously possible. However, there are imminent challenges, since the traditional subjective and non-adaptive echo analysis methods are not effective for analyzing the continuous ocean observatories sonar data flow with limited biological ground truth information (e.g., species composition) at the majority of locations.

To overcome these challenges, we will develop a suite of data-driven machine learning and inverse methods for objective segmentation and interpretation of OOI sonar echo time series. This is a crucial step toward delivering foundational biological information to understand the marine ecosystems under the changing climate – a highlight of the OOI program. We will use multi-dimensional echo features, including the mean echo strengths, distribution of echo fluctuation, and their joint variation across frequency as data descriptors. Since these features are strong functions of the size and identity of marine organism aggregations, by learning patterns based on echo features, we parse the incoming sonar data stream into biologically meaningful groups (e.g., fish-zooplankton foraging assemblage) useful for ecological research. The analysis framework will be structured such that the data parsing rules are updated adaptively based on the temporal evolution of echoes in the data stream.