Project Lead: Katie Gonser, Jackson School of International Studies
eScience Liaison: Jose Hernandez
This project looks at COVID-19-related mis/disinformation in Louisiana and Washington state during the first two surges of the pandemic. Part of a broader collaborative study between social scientists at the University of Washington and computer scientists at Louisiana State University (LSU), this research focuses on Twitter users’ sentiment and language use to make multi-way comparisons across and between the two states, at different stages of the pandemic, between COVID- and non-COVID-related content, and across users’ age and gender.
For the eScience Incubator Program, we worked on building a dashboard of data visualisations that all members of the research team can use to more readily view our comparisons and help with initial data exploration. We used the shiny package in R to build this dashboard. The dashboard displays interactive graphs that allow users to view variations in tweet sentiment across time and by users’ age, gender, or whether the tweet content is COVID-related or not. In addition, the dashboard shows maps of both Louisiana and Washington state with options to view tweet density or average sentiment score by county among different age groups, genders, COVID- or non-COVID-related content, and across the first and second surges of the pandemic. After selecting a subset of the data to display on the maps, the dashboard also shows a sample of the tweets corresponding to the selected subset and allows users to search for content within that sample. Moving forward, we plan to expand this dashboard with external datasets that help contextualize our Twitter data, such as timelines of related policy rollouts and local COVID-19 infection and death rates. This dashboard will then be made available online using Rstudio Connect. We plan to link this website on our future publications so that our data are more accessible.
Another project we worked on at the Incubator Program was the detection of mis/disinformation using n-gram analysis. We scraped the Poynter CoronaVirusFacts Alliance Database for all misinforming claims that were fact checked during our sample period (1st surge: 2/23/20 – 4/30/20; 2nd surge: 6/15/20 – 8/15/20). We then manually labeled all 673 claims according to theme categories; this was an iterative process during which we continually amended the definitions of each theme. We then applied Term Frequency Inverse Document Frequency (TF-IDF), based on unigrams and bigrams, to weight the terms used in the claims under each theme to determine which words had the strongest relationship with their corresponding theme. Searching for these terms in our Twitter sample then allowed us to begin identifying themes within our sample. After the incubator, we plan to continue refining our n-gram analysis and our theme categories. The results of this analysis will be used for an article on themes of COVID-related mis/disinformation in Washington and Louisiana state that we will submit for publication.