Authors: Fabliha Ibnat (UW eScience Institute DSSG intern), Chris Suberlak (UW eScience Institute DSSG intern), Jason Portenoy (UW eScience Institute DSSG intern), Joan Wang (UW eScience Institute DSSG intern), Xitlalit Sanchez (UW ALVA student intern), Cameron Holt (UW ALVA student intern), Neil Roche (BMGF, Data Scientist), Anjana Sundaram (BMGF, Data Officer), Bryna Hazelton (UW eScience Institute Research Scientist), Ariel Rokem (UW eScience Institute Data Scientist).

We are living in an age where data plays a role in almost everything we do. While the power of big data is already being harnessed in science and technology, the question remains how to bring the same disruptive impact to public policy and social good. Companies like Microsoft, Google and Facebook are using massive amounts of user data to create products that engage and entertain users (and to find the most effective advertisement to display to them). In a variety of scientific fields, new measurement devices are producing larger and larger quantities of data about everything from remote galaxies to our own DNA, accelerating our progress towards a better understanding of the universe. Some have even gone so far as to say that data is “unreasonably effective.” But how does one use data to promote social good? How does one harness the lessons learned in analyzing data from the internet, or data from scientific measurements, to address a social problem as challenging and complex as family homelessness?

This summer, the University of Washington’s eScience Institute is hosting the first installment of aData Science For Social Good program to address this question. Based on programs at the University of Chicago and at Georgia Tech, the goal of this 10-week summer program is to identify organizations devoted to social good and to use data science to increase each organization’s reach and impact: teams of student interns and data scientists are applying advanced data science techniques to questions pertaining to social good. During the 10 weeks of the program, the student interns receive instruction in programming and other data science methods through a variety of hands-on tutorials and lectures, while spending long days crunching data in the eScience Data Science Studio on the 6th floor of the Physics/Astronomy tower on the UW campus.

In partnership with the Bill & Melinda Gates Foundation, one of these teams is describing, analyzing, and providing insights to reduce family homelessness locally. In the Puget Sound region, there are over 4,000 homeless families with children spending an average of eight months moving from shelter to shelter. The foundation is working together with Building Changes and officials from Pierce, Snohomish and King Counties to reduce family homelessness so that it is rare and brief. To achieve these goals, it is necessary to understand the factors that determine whether families find permanent housing. To this end, relevant data across the Puget Sound region is aggregated, organized and analyzed.

Led by Neil Roche and Anjana Sundaram from the Gates Foundation, the student interns are working side-by-side with data scientists Bryna Hazelton and Ariel Rokem to search for factors that are associated with increased probabilities of families finding permanent housing, and factors that help understand whether a family will return to homelessness. In addition, they are joined by Xitlalit Sanchez and Cameron Holt, two participants in UW ALVA, a program for high-school student interns interested in STEM research.

The student interns in the team come from a diverse set of backgrounds which has created a blended, interdisciplinary approach to problem-solving on this project. In a post introducing himself on the team blog, Chris Suberlak, a PhD student intern in the UW Astronomy Department says: “Though my passion has always been Astronomy, I have deeply held beliefs about the value of persons and the need to provide for the dignity of human lives and communities. I would be thrilled if the outcome of my research conducted during this summer program could provide input that might affect the way local government identifies and appropriately responds to observed data­ patterns, and thus the needs of persons and communities.”

Fabliha Ibnat, a Senior in Economics at UW, reflects on the potential impact of this program: “I am not unique in saying that I want to lend myself to social good initiatives, and in the past I’ve volunteered in Bangladesh, teaching English and working at a health clinic in an effort to help. However I realized that my contributions do not always have to be global, and in fact may be more useful if I channel them towards issues here in Seattle, my hometown. The eScience Institute provides an opportunity to contribute to a social good initiative using skills that I already have, in an environment that supports further development of those skills without hindering project goals.”

Joan Wang, also originally from the area (Bellevue, WA), is studying Spatial Planning and Environmental Policy at Cardiff University (Wales, UK). She says:  “I always keep an eye out for innovative projects that are sprouting up in the Greater Seattle area—especially those that connect with surrounding urban communities.” Her background in environmental economics and urban planning dovetails nicely with the DSSG program theme. And she is excited to contribute to this multi-stakeholder collaboration while learning more cutting-edge data analysis tools.

Jason Portenoy, a PhD student intern in the UW Information School, points to the important gap filled by this kind of program: “As I’ve developed my interest in data science, I’ve worked on projects  that address the digital divide, the widening gap in the availability of things like data science and technical skills between those organizations with resources (such as tech companies and financial institutions) and those without (especially those in the nonprofit and government sectors).”

The project has now reached its halfway point, and there has already been exciting progress in understanding and analyzing the data. For example, one of the challenges in interpreting the data is the very definition of a family. This information is sometimes only implicitly present in the data, and additional analysis is needed to tease out this information. In a post on the project’s team blog, Chris described how an algorithm, which has been used in astronomy to group distant collections of galaxies into clusters, can be applied to this data to group different individuals into families. Chris explains “that vastly different scales can be united by an algorithmically similar approach, and therefore galaxy clustering is connected to searching for families in a dataset.”