Please join us for a UW Data Science Seminar featuring research teams from the Humanities Data Science Summer Institute on Tuesday, November 18th from 4:30 to 5:20 p.m. PT. The seminar will be held in IEB G109.
“Multimodal LLM Categorization for Humanities Research on Social Media: A Case Study with Pepe the Frog”
Abstract: Our project for HDSSI considered the utility of multimodal LLMs for humanities research, specifically concerning politically sensitive social media posts. Building on research on the use of AI for qualitative image captioning and analysis, we compared how LLMs versus human raters assessed the political positionality, hate content, and sexually explicit content in social media posts. Our dataset comprises 3407 tweets with imagery pertaining to the Pepe the Frog meme, drawn from the months surrounding the January 6, 2021 riot on the US Capitol. Using a zero-shot prompt, we queried a range of multimodal models including Claude Opus 4 and Sonnet 4, Mistral Pixtral 12B, and LLaVa-NeXT to obtain their categorizations of the material. We show that while the Claude models are extremely precise and consistent in their assessments, Mistral and LlaVa are considerably less capable of capturing hatred content. But strikingly, even the high-performing models tend to tag in ways that downplay individual responsibility for the circulation of politically sensitive material, a shift that has implications for notions of responsibility as they circulate in the digital public sphere. We conclude that between LLMs’ still-growing ability to categorize qualitative, multimodal material and the guardrails built into the models, scholars and students of the humanities should exercise caution while engaging LLMs to understand politically explosive social media material at scale.
Speakers: Adair Rounthwaite (Professor and Chair of Art History), Niko Nadirashvili (PhD student in Art History), Yuanxi Li (undergrad in Informatics and Sociology)
“The Canon in Circulation: Tracking the Reception of Norton Anthology Authors in Library Checkout Data”
Abstract: Which canonical American authors are the public reading, and why? We explore this question by analyzing nearly two decades of book circulation data from the Seattle Public Library (SPL), one of the only public libraries in the United States to make anonymized checkout data publicly available. Focusing on the 93 authors included in the post-1945 volume of The Norton Anthology of American Literature (NAAL), we examine 1.6k unique works and almost one million checkouts to better understand contemporary literary reception beyond the classroom. We present a novel dataset that can support future reception research and serve as a benchmark for future Work-level clustering approaches. Our findings suggest that the few genre fiction authors in the NAAL—particularly writers of science fiction—dominate the checkouts, and that circulation spikes are often triggered by high-profile media adaptations, the death of an author, and potentially even scandal.
Speaker: Neel Gupta is a PhD student in the iSchool at UW working in the fields of cultural analytics and the digital humanities. He received his undergraduate degree from Swarthmore College in English and Mathematics. He’s interested in how economic changes in society have affected the cultural and aesthetic production, specifically in the realm of prose fiction. Neel is more broadly interested in how digital and computational methods can be used alongside humanistic methodologies to study culture.
