Identifying Disinformation Risk on News Websites

A person reads a New York Times article called, "Coronavirus in California: Map and Case Count

Project leads: Maggie Engler, Lead Data Scientist, and Lucas Wright, Senior Researcher, at the Global Disinformation Index (GDI)

Data scientists: Noah Benson and Vaughn Iverson

DSSG fellows: George Hope Chidziwisano, Richa Gupta, Kseniya Husak, Maya Luetke

Participant bios available here.

Project Summary: Online disinformation has been used as a tool to weaponize mass influence and disseminate propaganda. To combat disinformation, we need to understand efforts to disinform – both upstream (where disinformation starts) and downstream (where and how it spreads). For the nonprofit organization Global Disinformation Index (GDI, https://disinformationindex.), financial motivation is a connecting point that links together the upstream and downstream components of disinformation. To reduce disinformation, we need to disrupt its funding. The ad tech industry has inadvertently thrown a funding line to disinformation domains through online advertising. Until now, there has been no way for advertisers to know the disinformation risk of the domains carrying their ads, as programmatic advertising means that advertisers might show ads all over the web without making a conscious decision about each location.

Additionally, the rise of the coronavirus crisis has further strained the online publishing industry. Controversy surrounding the coverage of the pandemic has spooked ad buyers into blocking websites using excessively broad keywords. In April, “coronavirus” overtook “Trump” as the keyword blocked by the most brands, and since news coverage has almost entirely focused on the pandemic for months, trustworthy media outlets are being starved of revenue as a result.

GDI’s solution enables ad providers to detect the presence of disinformation in the form of adversarial narratives automatically, and is becoming increasingly adopted across the ad tech ecosystem. GDI researchers with expertise in disinformation have identified a list of adversarial narratives, such as that vaccines cause autism and that coronavirus was created as a bioweapon. We have built a series of classifiers to enable the detection of articles that discuss these narratives; automated collection of metadata from tens of thousands of news websites; and manual reviews by media experts of the content, context and operations of a subset of these sites.

In this project, we will use this data to construct open source topic models that classify news articles according to their risk of containing disinformation about the coronavirus in real time. Much of the ad tech industry has already expressed a desire to integrate such models into their brand safety services, so there is a clear path to implementation. This will ensure that reliable information about the virus receives ample funding and the publication of harmful disinformation is disincentivized.

Detection of Vote Dilution: New tools and methods for protecting voting rights


A data visualization shows the percent of white and non-white voters over a map of Rockland County, New York

Nominating petitions plotted by address and race to identify patterns of support for candidates (2015 School Board Election, Rockland County, NY)

Project leads: Matt A. Barreto, Professor of Political Science and Chicana/o Studies, and Faculty Director of the Voting Rights Project at University of California, Los Angeles; and Loren Collingwood, Associate Professor in the Department of Political Science at University of California, Riverside

Data scientists: Scott Henderson and Spencer Wood

DSSG fellows: Juandalyn Burke, Ari Decter-Frain, Hikari Murayama, Pratik Sachdeva

Participant bios available here.

Project Summary: Section 2 of the Voting Rights Act allows voters to challenge district boundaries if they believe gerrymandering has been used to dilute their vote and block them from getting candidates of choice for their community elected. To win a VRA lawsuit, plaintiffs must prove that voting patterns in their community are “racially polarized” with Whites and minorities voting in opposite directions for different candidates. However most states do not collect racial data on voters, and the voters ballot is secret. To analyze voting patterns, social scientists use a statistical method called Ecological Inference (EI) to determine how different groups vote. But this method relies on imprecise census data and often creates biased estimates of voting patterns. Recently a new methodology has been developed for estimating voters’ race and ethnicity, which offers great promise for improving voting estimates and being a helpful tool for upholding minority voting rights.

This project uses the existing eiCompare R software package (Collingwood et al. 2016) to update and modernize ecological inference (EI) analysis to be used in voting rights and redistricting efforts. In particular, we propose numerous methodological, programming, and statistical advancements to the EI models in eiCompare to allow for a more accurate and precise model to capture racial voting patterns. In particular we will incorporate Bayesian Improved Surname Geocoding (BISG, see Imai and Khanna 2016) to analyze the surname and address of voters to estimate probabilities of their race or ethnicity, which can then be used in EI models. We will also troubleshoot and address bugs in the various EI model code to ensure that accurate estimates of voter preferences are being calculated and can be better used by state and federal courts when evaluating voting rights claims.

The project will be led by Matt Barreto and Loren Collingwood, two political scientists with experience in voting rights litigation, who have written and developed several software packages in R to assist with voting rights analysis. Barreto and Collingwood are currently involved in multiple efforts across the country to promote and uphold equal voting rights, and team members will engage with voting rights lawyers from the ACLU, NAACP, MALDEF and more about the real world application of statistical analysis to these efforts.