As states, counties, cities and school boards across the country prepare to re-draw their voting districts in 2021, a software package that is being updated through the Data Science for Social Good (DSSG) program will help to ensure equal representation among constituents.
Voting district boundaries are re-drawn after each Census to account for changes in population, ensuring that districts have an equal number of voters and equal representation by racial and ethnic group, in accordance with Section 2 of the federal Voting Rights Act of 1965. When voting patterns show minority and majority racial or ethnic groups are primarily voting in opposite directions, known as “racially polarized voting,” district boundaries may be challenged in court.
However, measuring racially polarized voting can be a challenge. The eiCompare software package was created in 2016 to help government, courts and legal advocates identify racially polarized voting patterns and consequent vote dilution – when minority voters are districted in such a way that reduces, or dilutes, their ability to elect preferred candidates. Ecological Inference, a statistical method used by sociologists and political scientists, is used to estimate the voting patterns in a jurisdiction by each racial or ethnic group. However there are methodological debates and questions about the most precise way to assess the race and ethnicity of voters, and how they voted – all of which is unknown given that most states do not require voters to report their race, and in all states, how they voted is private. Using public voter files and precinct election results, program participants are updating the software to improve accuracy, incorporate new methods and improve usability for non-data scientists.
The DSSG summer program at the eScience Institute brings together students, stakeholders, data scientists and domain researchers to work on project teams for a 10-week period. Detection of Vote Dilution: New tools and methods for protecting voting rights is one of two projects hosted by the program this year. The project team consists of four student fellows from universities around the country, two eScience data scientists, and the project leads: Matt A. Barreto, professor of political science and Chicana/o Studies at UCLA and Loren Collingwood, associate professor of political science at UC Riverside.
Multidisciplinary Approaches
The project presents complex methodological challenges. To determine whether minority vote dilution exists in a particular area depends on the breakdown of voter race and ethnicity by precinct – information that is not readily available and requires multiple data sources to estimate. Voter registration records provide turnout data but identify individuals only by name and address in most states. Census records contain the race and ethnicity of residents but results are aggregated into block groups (areas that contain up to 3,000 people), with different boundaries than voting precincts, and the records do not distinguish which residents are registered to vote.
Team members are working to improve the processes by which the software pre-processes and merges these data sets through various statistical methods and then compares the accuracy and consistency of results through data visualization. They are coding in the R programming language, creating tutorials called “vignettes” to link pieces of code with different end products, and geocoding voter addresses into Census block groups and mapping the demographic results onto voting precincts. A wide range of academic backgrounds are beneficial, providing different problem-solving perspectives, reducing the use of jargon in discussions, and encouraging the project leads to explain the details of their work more clearly, team members said.
Fellow Juandalyn Burke, a doctoral candidate at the University of Washington, described how her training in biomedical informatics and public health has informed her approach. “Our work currently focuses on having accurate US census data. However, due to COVID-19, the census data collected may not represent the general population and thus, after speaking with one of our stakeholders, it became apparent to me that the process of integrating a variety of data sources (which is done commonly in biomedical informatics and public health for issues such as food-borne illness, mental health, and infectious disease) may be an avenue that is necessary for collecting and analyzing voting rights data,” she said.
Real World Context
eiCompare is already being used in the real world. The project draws largely on data from a recent court win by the NAACP against the East Ramapo Central School District in New York. In May 2020, a federal judge ruled that the school board’s “at-large” voting system, in which the entire district voted for all seats on the board, gave an unfair advantage to candidates preferred by the white, Orthodox Jewish majority, and must be changed to a “ward” voting system, in which voters choose their representatives by district. In the ward system, at least some of the districts contain a majority of Black and Latino residents.
Fellow Pratik Sachdeva, a doctoral student in physics at UC Berkeley, said the project highlights the power of leveraging data science toward justice at the local level. “For example, our analyses to detect vote dilution have been applied in East Ramapo School District, NY, which might not be the first place to come to mind when we think about gerrymandering. However, the impact of these results is that communities of color in East Ramapo, who have been underrepresented on the school board for over a decade, will finally have their voices heard,” he said.
Project lead Loren Collingwood noted that the East Ramapo case marks the first time that a Section 2 Voting Rights Act case used Bayesian Improved Surname Geocoding (BISG), a method that estimates the probability of a voter’s race based on linkages between last name and race in local Census data. BISG was recently incorporated into the software to improve accuracy in determining minority vote dilution. “We are further expanding and streamlining functions that implement this methodology. Many voting rights specialists and lawyers think BISG is the future methodology in this area, so eiCompare is likely to become the most significant statistical package used in Section 2 voting rights cases,” he said. Fellows are currently working on how to account for complexities such as unknown name changes, hyphenated or uncommon names.
To incorporate the perspectives of project stakeholders who plan to utilize the results of their work, the team met with a lawyer from the Voting Rights Project at UCLA, which was co-founded by project lead Matt A. Barreto; data scientists at the National Democratic Redistricting Committee; and an attorney at the New York Civil Liberties Union who worked on the East Ramapo case. Once the next round of redistricting begins, the software package will be used to analyze current districts as well as new districts proposed by opposing parties in court.
Team members said the stakeholder meetings provided a narrative frame for how the analysis from eiCompare contributes to building evidence in court cases and offered strategic guidance in setting project goals. Fellow Ari Decter-Frain, a doctoral student in Policy Analysis and Management at Cornell University said, “Our conversations have helped us identify key deliverables and prioritize the features we want to include in our toolkit. It’s been inspiring to see our stakeholders’ passion for ensuring equal political representation of minorities, and the central role they see our tools playing in future voting rights litigation.”
Scott Henderson, a research scientist in the Department of Earth and Space Sciences and data science fellow at the eScience Institute, mentioned one concrete takeaway. “It’s so important to stay connected with end-users of software to prioritize tasks. For example, lawyers prefer 2D print figures over interactive web-based visualizations because printed reports are required,” he said.
Future Impacts
The resulting software package and skills developed through this DSSG project will have a wide application when the summer ends. Fellow Hikari Murayama, a master’s student in the Energy and Resources Group at UC Berkeley, said the tools she is learning in the program will inform her work on the policy implications of human-climate interactions. “My home program emphasizes the need to be an activist-scholar. In order to be an effective, responsible, and ethical one, using data science techniques, it’s vital to be able to expand your mind beyond the research question in front of you,” she said.
Spencer Wood, a research scientist at the eScience Institute and senior research scientist with EarthLab, said, “The experience is teaching me analytical approaches that I look forward to applying in other research projects. Questions about the representativeness of voting districts have analogues in research about where to place urban parks in order to serve a broad audience.”
Project lead Matt A. Barreto said the team’s work has been critical to incorporating recent developments in the field. “As more data on voting and elections becomes available, it is crucial that the models and approaches to understanding vote dilution and minority representation keep pace with the latest developments. Working with the DSSG summer fellows and data scientists has transformed our software eiCompare from a basic package to a cutting edge, sophisticated software suite that can clean and process voting data, identify and detect patterns of vote dilution, map and layout districting solutions and much more,” he said.
The final DSSG 2020 presentations will take place on Wednesday, August 19th via Zoom from 1:00 to 2:30 p.m. The event is open to the public. RSVP is required.