Student fellows at the University of Washington Data Science for Social Good (DSSG) program presented the results of their 10-week summer projects on August 17th. The fellows conducted their work on four interdisciplinary teams, working with project leads and data scientists.
The DSSG program at the eScience Institute brings together students from universities around the country with data and domain researchers to work on collaborative projects for societal benefit. The projects are informed by program-wide workshops and tutorials on technical and ethical issues, in addition to team-based meetings with stakeholders, such as domain experts and organizations that plan to use the project outcomes to support future work and inform policy decisions. This year the program was run in a hybrid format, with two teams working remotely and two teams working in person in the eScience Institute’s Data Science Studio on the UW Seattle campus.
This was the eighth year of the DSSG program. The sponsors were UW and the Micron Foundation, which provided funding for fellow stipends and resources, and supported students who needed additional financial resources to participate, through the Micron Opportunity Award.
Over 100 people attended the event, hosted by DSSG Program Director Sarah Stone, which took place via Zoom and in person. The full recording is available here. Summaries of the final presentations, with links to individual project presentations, are provided below.
DSSG Program Chair Anissa Tanweer noted that, “Ten weeks go by incredibly fast. While the teams make tremendous progress in that time, what’s really impactful is that their work is going to be used and continued by their project partners who have long term investment in the important issues these projects are tackling.”
Heating Loads in Alaska and Beyond
This project developed improved methods for estimating the electricity demand for heating buildings in Alaska, where a significant amount of energy is used for heating, yet there is a lack of centralized energy use data. The team created a supervised learning model to calculate the heating load per building by combining local climate data with geospatial data on the age, height and footprint of more than 100,000 buildings, derived from open-source satellite data in Google Earth Engine. The team used data that tracks changes in land vegetation to estimate building age, and land elevation data to estimate building height. They mapped land-surface temperature data over building outlines and aggregated them over time. The project focused on the Railbelt electrical grid, which extends from Fairbanks to Anchorage and the Kenai Peninsula, with a goal to create methods that can be expanded to the greater Arctic region, to inform decarbonization policy efforts. The use of geospatial data provided a faster method than the on-the-ground data collection methods that are more common in the region. The team found that building age, which may be correlated with air leakage, was the most important predictor of heating load, compared with square footage and building height. This finding could be used to inform policy about whether buildings should be retrofitted or replaced.
Future directions for the project identified by the team include:
- Gaining access to a comprehensive database on heating loads from the Alaska Housing Finance Corporation to explore a wider range of modeling approaches.
- Exploring hourly heating load estimates, to provide more granularity than yearly aggregates, and exploring the effects of heating individual floors within a building.
- Incorporating a public energy retrofit database to better understand the effects of building retrofitting on heating loads.
The student fellows for this project were Vidisha Chowdhury, a master’s student at Heinz College of Information Systems and Public Policy at Carnegie Mellon University; Maddie Gaumer, a master’s student in the Department of Applied Mathematics at the UW; Philippe Schicker, a master’s student in the Heinz College of Information Systems and Public Policy at Carnegie Mellon University; and Shamsi Soltani, a doctoral student in the Department of Epidemiology and Population Health at the Stanford University School of Medicine. The project lead was Erin Trochim, Research Assistant Professor at the Alaska Center for Energy and Power at the University of Alaska Fairbanks. The team’s data scientist was Nicholas Bolten, a Data Science Postdoctoral Fellow at the eScience Institute.
Quantifying the Impact of Satellite Streaks in Astronomical Images
This project established a new analytical tool to address the exponential growth of low Earth orbit satellites, which leave bright streaks in the sky that impact astronomical research, nature, and cultures. For example, satellite streaks compromise the relationship between indigenous communities and the night sky, and streaks in astronomical images make it harder to study stars, the galaxy and the universe. To address this, the team created Satmetrics, a Python library to detect and validate satellite streaks in astronomical images from telescopes across the world and measure their basic properties. This generalizable tool for quantifying the emerging impacts of satellite streaks at an aggregate level expands upon existing work to measure streaks at the individual telescope level. The resulting data can be used to support the comparability of brightness across different satellites, provide brightness information to satellite operators, and examine the extent of damage to an image, to support larger efforts to mitigate streak brightness.
To build this tool, the team explored the current body of work, conducted image processing tasks such as clustering lines to avoid over-estimations, and applied an algorithm to count pixels, validate lines, and distinguish streaks from noise. They used images from Trailblazer, an open data repository for astronomical images affected by satellites. Next steps for further developing the Python library are to create functions that evaluate whether detected streaks would be visible to the naked eye, measure accuracy, create test datasets, convert pixel intensity to units that are understood in astronomical research, and establish integration with the Trailblazer dataset.
The student fellows for this project were Abhilash Biswas, a graduate student at Heinz College at Carnegie Mellon University; Kilando Chambers, an undergraduate student in Applied Mathematics and Psychology at Harvard University; and Ashley Santos, an undergraduate student in the Computing, Data Science and Society Department at the University of California, Berkeley. The project leads were Dino Bektešević, a graduate student in the UW Department of Astronomy, and Meredith Rawls, a research scientist in the Department of Astronomy and the Institute for Data Intensive Research in Astrophysics & Cosmology (DiRAC) at UW. The team’s data scientist was Vaughn Iverson, a research scientist at the eScience Institute.
Exploring New Understandings of the Cost of Living at a Basic Needs Level Using the Self-Sufficiency Standard Database
This team produced and tested a concept design for a new database to improve the workflow for using data in the Self-Sufficiency Standard, which was created by the Center for Women’s Welfare at the UW School of Social Work in the 1990s. The Standard is a budget-based living wage measure that determines the amount of income required for working families to meet their basic needs at a minimally adequate level. It provides an alternative to the Official Poverty Measure by accounting for regional differences in the cost of living across 42 states. The Standard is used by community organizations, academic researchers, policy institutes, legal advocates, state and local officials, and others for purposes such as research and analysis, benchmarking for wage setting, program evaluation, and financial counseling. The new database consolidates files using automated procedures, which reduces human error and increases speed and efficiency. This change expands potential avenues of research by enabling richer and more novel analyses, such as easily exploring the data across different years, states and family types, and producing data visualizations of the results.
The team completed the following steps as part of their work:
- Gained an understanding of the data columns and their relationships.
- Pre-processed files in Python, including standardizing the column names, in preparation for storing all of the files in one database.
- Creating a relational model schema that arranges the data in a series of tables, rows and columns, with relationships between the tables.
- Created an interactive map of the U.S to show comparisons between the percentage of the population living under the Standard vs. the poverty line, by state and family type.
- Documented their code in a Github repository, for transparency and reproducibility, and handed off the database for further development to the Center for Women’s Welfare.
The student fellows for this project were Azizakhon Mirsaidova, a master’s student in Artificial Intelligence at the McCormick School of Engineering at Northwestern University; Priyana Patel, a master’s student in Human Centered Design and Engineering at the UW; Cheng Ren, a doctoral student in the Berkeley School of Social Welfare at University of California; Hector Joel Sosa, a Ph.D. student in Social Psychology at the University of Massachusetts – Amherst. The project leads were Annie Kucklick, Research Coordinator, and Lisa Manzer, Director of the Center for Women’s Welfare. The team’s data scientist was Bryna Hazelton, a Senior Research Scientist in the UW Physics Department and the eScience Institute.
Building Households and Families out of Individual Level Administrative Data
This project utilized administrative data from multiple state agencies to generate new data for exploring the impacts of the Seattle minimum wage policy on poverty at the household level. This work expanded upon the ongoing UW Minimum Wage Study, which had previously focused on individual workers’ experiences, by considering how resource sharing within families and homes can provide new insights for measuring poverty. For example, looking at individual wages alone obscures whether a minimum wage worker may be financially independent, dependent on others with higher incomes, or supporting a family. Grouping individuals based on co-residence and other factors, to infer their membership within resource-sharing units, allows for new avenues of analysis.
To tackle this issue, the team utilized overlapping address data from multiple sources to identify indicators and generate methods for estimating a diverse range of household and family compositions. They used the Washington Merged Longitudinal Administrative Data (WMLAD), which contains data for 10 million people from 2010-2016 in a secure enclave, using records such as unemployment insurance, birth certificates, voter registrations, social service benefit receipts, and drivers’ licenses. To further protect privacy, the team worked with codes to represent last names and addresses. Then they used probability and regional statistics to determine whether households are likely to be families versus roommates; including whether residents with two sets of last names within a single residence are likely to constitute one or two households.
The team’s contributions to this ongoing work included creating a relational database with well-structured data to improve work efficiency for future WMLAD users; and applying both point-in-time and longitudinal approaches to derive and improve household and family identifiers. Plans for future work include applying the longitudinal approach to households of larger sizes, and incorporating other information, such as social networks, demographics, and anti-poverty programs, to capture a greater diversity of households.
The student fellows for this project were Zhaowen Guo, a Ph.D. candidate in Political Science at UW; Ihsan Kahveci, a Ph.D. student in Sociology at UW; Betelhem Aklilu Muno, a Master of Public Health Student in Epidemiology in the UW School of Public Health; and Eliot Stanton, a recent graduate in Data Science and Analytics at Simmons University. The project lead was Jennie Romich, a Professor of Social Welfare at the UW School of Social Work and Faculty Director of the West Coast Poverty Center. The data scientist was Jessica Godwin, Statistical Demographer and Training Director for the UW Center for Studies in Demography & Ecology