For an overview of the Incubator Program click here.

Machine-learning-based detection of offshore earthquakes

Project Lead: Zoe Krauss, School of Oceanography, College of the Environment

Seismometers in the Pacific Northwest, land-based vs. offshore

eScience Liaison: Scott Henderson

The fault zones that cause the most devastating earthquakes and tsunamis on Earth lie beneath our oceans, but offshore seismic observations are severely limited by a lack of instrumentation and noisy data. To fully understand Earth’s geodynamics and the hazard that offshore fault zones present, we need to overcome these limitations to produce offshore earthquake catalogs that capture as small of signals as possible. In recent years, deep neural networks have shown great performance in creating earthquake catalogs using land-based data. For our eScience incubator project, we developed a python-based workflow to test the performance of these largely land-based pre-trained deep neural networks on offshore seismic data. We emphasized parallelization in our codes to leverage the capabilities of multiple CPUs and developed infrastructure to utilize GPU instances and scalable storage with Azure cloud computing. Preliminary results show overwhelmingly poor performance of the pre-trained neural networks, with high rates of false positives due to the mischaracterization of noise. Our results strongly indicate the need to retrain machine learning models using ocean bottom seismometer data. 

User-friendly Tools for Oceanic Plankton Image Analysis (UTOPIA)

Project Lead: Ali Chase, Washington Research Foundation Postdoctoral Fellow, Applied Physics Laboratory 

eScience Liaison: Valentina Staneva

Example plankton images collected with the Imaging FlowCytobot from open ocean surface waters

Thanks to recent advances in instrumentation, we can now observe phytoplankton – the single-celled autotrophs that form the base of the marine food web – using automated, high-throughput microscopy. Millions of phytoplankton images have been collected from oceans and seas across the globe, using an instrument called the Imaging FlowCytobot (IFCB), which is deployed onboard oceanographic research vessels and captures thousands of individual particle images every hour. Use of novel plankton imagery data to address a wide range of oceanographic and marine ecosystem questions is currently limited by the time required to analyze and categorize images. Processing these images for use in oceanographic research is time consuming, and the quantity of data necessitates the use of automated processes to classify images. Thus, the need for open-source, efficient, and effective classification tools is high. The primary objective of the UTOPIA incubator project is to develop machine learning methods for IFCB image data classification, and to produce an open-source, user-friendly tool that allows for broad application within the oceanographic research community.

Climate Refuge in Urban Areas: Using Spatial Data to Identify Risk and Benefit Tradeoffs

Project Lead: Rebecca Neumann, Civil & Environmental Engineering

eScience Liaison: Spencer Wood and Scott Henderson


In Washington State, climate change is causing more frequent summer water shortages, wildfire, flooding, poor air quality, heat-related illnesses, respiratory illnesses, and mental health stress. Socially and economically disadvantaged people are disproportionately impacted by these changes. Given these disproportionate impacts, there is a need for regional managers and planners to understand and prepare for the changes that will occur within different neighborhoods. In urban areas, greenspaces and blue spaces (i.e., water bodies) are important features, providing refuge against hot temperatures and even helping to mitigate heat island effects. However, many areas in the Puget Sound region are impacted by legacy contamination (e.g., emissions from the former Asarco Smelter), air pollution, and water pollution. It is important that efforts to create new outdoor climate-refuge space consider the potentially negative health effects associated with exposure to pollution and contaminants given increased use of these impacted areas. In addition, some existing green and blue urban spaces are presently contaminated, resulting in certain activities, like fishing for food, posing a health risk. In this project we are created a Python workflow that can assist with generating the knowledge managers need to evaluate benefits and risks associated with existing and planned spaces for climate adaptation. We created python scripts that ingest relevant spatial data about existing green and blue spaces, including populations within a 10-min walk and drive of parks, air pollution, and water quality; clean and align those data; assign population and environmental-quality characteristics to individual parks; and create maps and other visualizations that show park environmental-quality and how quality relates to populations served. Results are summarized and presented through a public GitHub page. Ultimately, after community and agency input, the tool will provide information needed for city managers and planners to identify areas in need of new green and blue spaces, understand the environmental risk involved with creation of these spaces, and to determine if existing green and blue spaces require investment for lowering environmental exposures to pollution.

Geometry of Color: Connecting spectral topography of the central cone photoreceptor mosaic to functional limits of the human trichromatic visual system

Project Lead: Sierra Schleufer, Neuroscience

eScience Liaison: Bryna Hazelton

L, M, and S cones artificially colored red, green, blue over image of photoreceptors in the central retina

Humans experience remarkable visual acuity among mammals thanks to a retinal mosaic in which cone photoreceptors sensitive to three spans of the visual spectrum are increasingly concentrated toward the central visual field.  Our trichromatic vision allows us to discriminate hues along two spectrally opponent axes in addition to luminance, thanks to an interleaved mosaic of L, M, and S cone photoreceptors sensitive to long-, middle-, and short-wavelength spans of the visual spectrum, respectively. Resolution and detection along these three axes are fundamentally limited by cone spatial and spectral topography (i.e. the ratio, density and arrangement of the 3 cones types in the photoreceptor mosaic). In some respects these arrangements are known to vary widely between people (e.g. the global ratio of L to M cones), with surprisingly little-to-no effect on standard color vision metrics, while other features are thought to be more consistent (e.g. percent S cones as a function of retinal eccentricity).

Our team explored how we can take advantage of extant analyses that are best suited to evaluate how cone arrangement influences and constrains spatial and color vision. We began by applying classic two-point correlation techniques to study the intra-cone spacing characteristics of L-, M-, and S- sub-mosaics within regions spanning the central ~5° radius of the retina in 4 human subjects (a classified human mosaic dataset of thus far unprecedented size in the field).  Monte Carlo simulations were used to simulate cone mosaics with random spacing characteristics as a basis of comparison for the spatial arrangement of real cone data.  For every cone sub-mosaic analyzed, a population of 1000 Uniform Monte Carlo mosaics were generated with cones randomly spaced within the rectangular bounds of the mosaic, which provided a distribution representative of uniform random arrangement. A second series of 1000 Monte Carlo mosaics were generated in which cones could only occupy actual positions of cones from the total mosaic (essentially shuffling the cone types of the real data). For each real and simulated mosaic, a matrix of inter-cone distances was calculated and represented as a histogram, then normalized by the mean of the uniform Monte Carlo histograms. Overlaying the real cone mosaic’s result on the mean +/- 2 std the two Monte Carlo distributions allows us to determine whether a cone-type sub-mosaic’s arrangement contains isotropic structure beyond what can be explained by occupying the hexagonally packed cone mosaic. The distribution of real intra-cone distances in the mosaic is assessed for whether cones are more spaced (crystalline) or close together (clumped) than cone-position-locked Monte Carlo distribution (as a function of distance from any given cone within the mosaic). We built this analysis into a fledgling package of open-source software tools designed to be highly modular, such that we (and, aspirationally, future collaborators) can expand our battery of analyses and datasets over time with minimal obstacle.

Preliminary results corroborate previous findings we’ve made using other analyses – that the central S-cone sub-mosaic features a degree of crystallinity statistically distinct from what we would see if the same number of S-cones randomly arranged within the cone mosaic. This suggests both that there are biological processes enforcing this spacing and an evolutionary pressure to do so. Meanwhile the L- and M- cone sub-mosaics are nearly completely indistinguishable from randomly arranged mosaics. However, additional work is required to estimate error within our classified cone mosaic data to fully interpret results like these. We will continue developing our analysis and the software framework with the goal of better understanding physiology that supports color vision. In the near future, this will include error estimation, assessment of inter-cone-type spacing characteristics, and analyses sensitive to anisotropic topography. Furthermore, we will continue to prioritize making these tools accessible and collaborative. This effort will produce the most detailed characterization of human spectral topography to date, in addition to bringing powerful techniques from other fields to retinal physiology. These findings will provide the basis for future experiments to test structure-to-function hypotheses regarding biophysical models and human psychophysics.

Patterns of COVID-19-related mis/disinformation on Twitter: themes of mis/disinformation and data visualisations

Project Lead: Katie Gonser, Jackson School of International Studies

eScience Liaison: Jose Hernandez

This project looks at COVID-19-related mis/disinformation in Louisiana and Washington state during the first two surges of the pandemic. Part of a broader collaborative study between social scientists at the University of Washington and computer scientists at Louisiana State University (LSU), this research focuses on Twitter users’ sentiment and language use to make multi-way comparisons across and between the two states, at different stages of the pandemic, between COVID- and non-COVID-related content, and across users’ age and gender.

For the eScience Incubator Program, we worked on building a dashboard of data visualisations that all members of the research team can use to more readily view our comparisons and help with initial data exploration. We used the shiny package in R to build this dashboard. The dashboard displays interactive graphs that allow users to view variations in tweet sentiment across time and by users’ age, gender, or whether the tweet content is COVID-related or not. In addition, the dashboard shows maps of both Louisiana and Washington state with options to view tweet density or average sentiment score by county among different age groups, genders, COVID- or non-COVID-related content, and across the first and second surges of the pandemic. After selecting a subset of the data to display on the maps, the dashboard also shows a sample of the tweets corresponding to the selected subset and allows users to search for content within that sample. Moving forward, we plan to expand this dashboard with external datasets that help contextualize our Twitter data, such as timelines of related policy rollouts and local COVID-19 infection and death rates. This dashboard will then be made available online using Rstudio Connect. We plan to link this website on our future publications so that our data are more accessible.

Another project we worked on at the Incubator Program was the detection of mis/disinformation using n-gram analysis. We scraped the Poynter CoronaVirusFacts Alliance Database for all misinforming claims that were fact checked during our sample period (1st surge: 2/23/20 – 4/30/20; 2nd surge: 6/15/20 – 8/15/20). We then manually labeled all 673 claims according to theme categories; this was an iterative process during which we continually amended the definitions of each theme. We then applied Term Frequency Inverse Document Frequency (TF-IDF), based on unigrams and bigrams, to weight the terms used in the claims under each theme to determine which words had the strongest relationship with their corresponding theme. Searching for these terms in our Twitter sample then allowed us to begin identifying themes within our sample. After the incubator, we plan to continue refining our n-gram analysis and our theme categories. The results of this analysis will be used for an article on themes of COVID-related mis/disinformation in Washington and Louisiana state that we will submit for publication.