Data scientist leads: Jake VanderPlas (primary), Bryna Hazelton (secondary)
DSSG fellows: Mayuree Binjolkar, Daniel Dylewsky, Andrew Ju, Wenhao Zhang
Project Summary: Seven regional transportation agencies in the greater Puget Sound region use a common electronic fare payment system, called One Regional Card for All (ORCA). ORCA data provide travel behavior information that can be used to improve regional transportation system planning and decision making. The Data Science for Social Good (DSSG) program will be using two nine-week ORCA data sets, each consisting of over 20 million transit boarding records, to determine the changes in transit behavior that occurred when light rail stations were opened in the Seattle Capitol Hill and University District neighborhoods.
For example, we know that transfers from the Sounder commuter train to Link light rail more than doubled in spring 2016 when two new light rail stations opened. How many of these individuals are headed to the University District? Are they new transit riders? Or did they previously take buses to the University of Washington (UW)? If they took buses, did they take the limited direct bus service, or take the bus routes that operate more frequently to downtown, and then transfer to buses headed to the UW? We will also look at service characteristics before and after the light rail opens (e.g., are transfers to and from trains faster and more reliable than transfers to the UW bound buses?) to get a better understanding of why these behavior changes are occurring. The project goal is to not only describe the changes in travel behavior, but to develop scalable algorithms and procedures from our analysis that can be applied throughout the region. Can we develop reliable models that predict the different behaviors based on the different service levels of each type of trip? And if so, how different are those models from the ones currently used by Puget Sound Regional Council to predict transit use?
Project Outcomes: The ORCA DSSG group made several advancements. The first major advancement is an improved determination of origin/destination travel patterns from ORCA data. The second advancement was in improving estimation of total transit ridership, based on ORCA ridership.
The DSSG team developed a semi-supervised machine learning approach to improve the origin/destination matrices developed using ORCA fare card boarding data. The machine learning algorithm identifies which “transfers” marked bv ORCA actually represent transit boardings which have taken place after an activity has been performed, and which are direct transfers from one transit vehicle to another. The machine learning process identifies when two short trips are being treated by the ORCA financial system as one trip with a long duration “transfer” in the middle of that trip. This additional data processing step helps improve the accuracy of the ORCA data when describing where and when people are traveling via transit.
The second advancement entails a better understanding of how to estimate the number of cash paying customers using transit. ORCA payment records do not observe cash paying customers. The DSSG team explored multiple techniques for using ORCA data to estimate total ridership. The DSSG team explored several options. The most promising results from the summer involved the use of a Hidden Markov Model, which continues to undergo refinement.
Strengthening capacities, knowledge and data sharing platforms for sustainable development
Photo courtesy of Vital Signs.
Project leads: Matt Cooper, data manager, Vital Signs and Tabby Njunge, technical operations manager, Vital Signs
Data scientist leads: Anthony Arendt (primary) and Joe Hellerstein (secondary)
DSSG fellows: Cara Arizmendi, Mitchell Goist, Krista Jones, Robert Shaffer
Project Summary: To meet the food security and nutrition challenges of today — with nearly one billion chronically hungry people worldwide — and tomorrow will require an estimated 70 – 100% increase in food production. Millions of small-holder farmers will need to play an important role in meeting this need, particularly across Africa. Unfortunately, agricultural activities are degrading ecosystems and the benefits they provide for people faster now than ever before. We need to find new ways of growing food that can simultaneously deliver food security, environmental sustainability, and economic opportunity. There is an urgent need for better data and risk management approaches to guide sustainable agricultural development and ensure healthy and resilient ecosystems and livelihoods. Vital Signs aims to meet this need for informed policy by providing better data and risk management tools to optimize agricultural development decisions for the needs of the human beings they serve and the ecosystems upon which they depend. Headquartered in Nairobi, Kenya, Vital Signs has worked in Kenya, Ghana, Tanzania, Rwanda and Uganda.
Vital Signs collects data on the ground using a peer-reviewed monitoring system, integrates data from national governments and third-party data sources, and builds online platforms for data exploration and decision support. This monitoring system collects data on agricultural practices and yields, the environment and biodiversity, land cover, soil health and human well-being, in several 10 x 10 km landscapes in each country. This data is analyzed to show spatial and temporal trends, as well as to create multivariate models. Vital Signs does all of this in close collaboration with national and multinational stakeholders and policymakers. The data is visualized on platforms like indicators.vitalsigns.org, and is freely available for download on the Vital Signs website, as it is intended to be a global public good and a resource for any interested party.
Project Outcomes: The team utilized data focusing on female-headed households’ access to productive resources and ecosystem services; how natural resources supplement household expenditures on food; how benefits from agricultural intensification relate to household income, level of education and gender; and access and use of extension services.
The following blogs were posted to the Vital Signs website:
Data scientist leads: Valentina Staneva (primary) and Vaughn Iverson (secondary)
DSSG fellows: Brett Bejcek, Anamol Pundle, Orysya Stus, Michael Vlah
Project Summary: Vehicles that have arrived at their destination but are driving around for a place to park, and for-hire and transportation network company vehicles that are queued in traffic, have a significant impact on congestion. The Cruising Traffic Analysis project will develop algorithms to quantify aggregated levels of vehicle traffic cruising. The research intends to apply data science techniques to a sample of anonymous travel sensor data, paid parking transaction information, and parking occupancy surveys conducted by the City of Seattle. We hope to generate heat maps depicting relative prevalence of cruising and propose measurement standards for cruising activity, such as a “cruising index” that could pertain to various methods of data collection and processing.
We will attempt to differentiate between the aggregated footprint of vehicles trying to find on-street parking and the amount due to trip deadheading. If successful, this research could help transportation agencies, technology companies, and car companies predict the availability of parking and more accurately direct travelers with online, mobile, and connected tools, thereby reducing congestion impacts, emissions, and fuel costs.
Project Outcomes: The team created a processing and classification pipeline that labeled 35% of total discernable data as cruising. Of that amount, activity attributed to vehicles-for-hire was in the range of 10% or less. These preliminary results were aggregated to block segments and hourly time periods to generate cruising heat maps showing spatial and temporal variance. Steps consisted of: 1) Estimating Paths, 2) Metadata Collection, 3) Vehicle-for-Hire Labeling, 4) Multi-Step Classification, Semi-Supervised Machine Learning, 5) Aggregation and Heat Map. The team found that the intensity of cruising for parking as a proportion of total traffic fluctuates with time of day, and cruising in the Central Business District exhibited weekday spikes during mornings, lunchtime and evening commutes as expected. The City of Seattle will analyze the results and review cruising patterns over a longer period to determine next steps for utilizing the data.
Data scientist leads: Bernease Herman (primary) and Amanda Tan (secondary)
DSSG fellows: Hillary Dawkins, Jacob Kovacs, Yahui Ma, Jacob Rich
Project Summary: In the past years, Seattle has seen unprecedented population growth, record construction activity, and an increase in housing cost, creating an affordability crisis for a large portion of the urban population. The “Equity Modeler” team is investigating the ongoing gentrification process and inequitable access to opportunities across many of Seattle’s neighborhoods. The project uses publicly available data for GIS-based mapping of equity indicators – related to housing and development, income, mobility, and education – on the city and neighborhood scale. It will develop a structural equation model to establish and predict relationships between indicators and analyze policies intended to initiate positive change.
The team’s goal is to create a tool that brings clarity and direction to an impassioned public discussion and allows stakeholders in the city’s development process to analyze, model, and visualize existing trends and the impact of potential changes in the built environment.
Project Outcomes: The team conducted a literature review and identified a set of indicators for each of several themes considered to be fundamental to the study of urban equity including housing, mobility, health, environment, socio-economic wellbeing, education, development, and neighborhoods. For each cluster of indicators the team conducted a factor analysis to determine the most crucial indicators. Those selected indicators were combined into a structural equation model, which allowed the team to evaluate correlation amongst the themes as well as to pilot a predictive function in the model that allows the viewer to toggle values higher and lower to test different development scenarios.
Using D3, the team developed a web-based interface for the modeler, that allows users to examine analytical layers at three different scales: Census block group, Census tract, and neighborhood as well as by individual indicator, thematic cluster, and by overall rating. The website also includes a prototype feature to zoom into the neighborhood and view data at the lot level.