Rebeca de Buen Kalman (right) and fellows work in the WRF Data Science Studio

Social Good Summer Blog, issue two

This summer series will highlight weekly blog posts from this year’s UW Data Science for Social Good Fellows.

“Learning to code and coding to learn”*, by Rebeca de Buen Kalman, 2018 Data Science for Social Good Fellow

Rebeca de Buen Kalman (right) and fellows work in the WRF Data Science Studio
Rebeca de Buen Kalman (right) and fellows work in the WRF Data Science Studio

This summer, I am a Student Fellow in the University of Washington’s Data Science for Social Good Program. I am working on the Seattle Mobility Index project where I get to apply my relatively recent interest in data science to my longtime passion for urban mobility.

I started my Ph.D. in Public Policy at the University of Washington in 2015. Among many of the reasons I had to pursue a Ph.D. was a desire to acquire new skills and frameworks to understand the social world, to be better equipped to make a positive contribution to the issues that matter to me.  

Through exposure to statistical programming languages like R and stata in some of the core classes I took for my Ph.D., I quickly realized that becoming a proficient programmer would open a world of opportunity.

Learning to code late in my career was not a straightforward choice. However, I underestimated how much my training as a social scientist would benefit from learning this new skill, and it has allowed me to venture into the fascinating world of data science.

I have come to realize that programming and data analysis is mostly about problem-solving rather than writing in a specific language. To be able to successfully do something to your data with code, you have to be able to explain it clearly in plain English. By learning to write code, you gain the ability to break up problems to solve them. Writing the actual program is a small part.

Programming has given me access to vast and varied sources of data and the ability to bring them together to explore topics and answer questions that interest me. This includes traditional open data sources like administrative data and census data as well as more novel collections of digital data like bikeshare trip data or google trip data. By learning more about how software is written and designed, I have even gained a lot of insight into how many of the platforms that generate digital data work behind the scenes.

I have also learned how to explore, manipulate and visualize data in ways that lead me to ask better questions and think more in-depth about my data and my subject. For example, this summer my team is analyzing trip data that we have obtained from Google. To write part of the code that we will use in our project, we had to sit down together and think about what defines a trip, what its core attributes are and distill its main characteristics to be able to clearly conceptualize a trip into our Mobility Index.

The main thing I have learned is to let go of my previous idea of what it means to program and how this skill should be learned. It is true that there is a learning curve, but it is a skill that is accessible, and the benefits are multiplicative. Once you have learned a few basic principles, you can start to solve problems.  As you code, you continue to learn new things and refine your skills. There are many open source tools, courses, books, forums and other resources on the web where you can even learn from some of the people developing the most cutting-edge tools that exist today.

*The title is a phrase by Mitchel Resnick.