The 2025 Humanities Data Science Summer Institute: Celebrating Three Years of Interdisciplinary Collaboration

This summer the UW Data Science Minor, in collaboration with eScience, offered the third iteration of the Humanities Data Science Summer Institute (HDSSI). Undergraduates, graduate students, and faculty from several university departments gathered to work together on humanities data science research projects. During this collaborative process, students attended training sessions with guidance from HDSSI co-founders and co-chairs English Assistant Professor Anna Preus and iSchool Assistant Professor Melanie Walsh as well as from eScience Technical Educational Specialist Naomi Alterman (who recently started as an Assistant Teaching Professor of Computer Science). Then, the students broke out into four teams each led by a faculty member to pursue research projects ranging from the study of historical archives to contemporary social media.  Each team also included a graduate research assistant who helped to support the undergraduates in utilizing computational tools for humanistic inquiry such as Python, ArcGIS maps, R, and multiple AI models.

Under the directorship of Professor Ben Marwick, the UW Data Science Minor has supported the hiring of digital humanities faculty and the development of new data science and data studies courses. His support allowed for the development of the HDSSI, a program model that the co-chairs were familiar with from their time together as graduate students at Washington University St. Louis. Anna Preus, Assistant Professor of English and Director of the Humanities Lab, shares that “our humanistic methods are incredibly important to responsible data work with cultural artifacts and human productions. We really emphasize throughout the program that all data exists in a social and cultural context, is gathered and produced and organized by humans, that it always contains biases and is never comprehensive.” Data science can also help humanities scholars expand the scope and shape of their research. As Preus further elaborated, “if I’m in literary studies, I’m usually limited to the books I can read myself, but computational methods that work with larger scale data allow me to ask different types of questions and combine larger scale analysis of the publishing industry with literary analysis.”

Now, three years of HDSSI projects have come to represent the exciting variety of places that computational methods and humanistic inquiry intersect. In 2025, groups worked across a diversity of topic areas and types of data. Here is an overview of this year’s projects:

Parsing the English Catalogue of Books

Faculty Mentor: Anna Preus, English Assistant Professor

Graduate Research Assistants: Hannah Lee Scherr and Siddharth Bhogra

Undergraduate Fellows: Annapurna Iyer, Tanisha Deka, Valeria Fierro, and Zhiming Huang

Parsing the English Catalogue of Books is an effort to extract bibliographic data from digitized editions of early 20th-century publishing catalogues. The ECB was issued yearly for almost a century by the trade publication Publishers’ Circular, and  it provides a list of books published in England and Ireland each year along with basic information about them (price, format, size, publisher, etc.). Our goal is to parse the catalogues from the years 1902 to 1922 and to create an open access dataset of books published in England and Ireland during these years. 

The Afterlives of Postwar American Authors

Faculty Mentor: Melanie Walsh, iSchool Assistant Professor 

Graduate Research Assistant: Neel Gupta

Undergraduate Fellows: Daniella Maor, Emily Backstrom, Hongyuan Dong, and Karalee Harris

This project tracks the afterlives of several influential  postwar American authors — including James Baldwin, Kurt Vonnegut, Sandra Cisneros, Chris Kraus, and David Foster Wallace — through data, particularly social media posts and library checkout records. This summer, we collected and analyzed 4chan posts about David Foster Wallace, examining how the author and his mega-novel Infinite Jest were discussed in the context of gender, misogyny, and reading. We also analyzed library checkouts for all authors in the Norton Anthology of American Literature (1945-), drawing on unique public data from the Seattle Public Library. We created an interactive explorer where users can track trends for specific authors and works. We also had a paper version of our research accepted to the Computational Humanities Research (CHR) conference, which will take place in Luxembourg in December 2025.  

Dream Palaces — Black Cinemas Spaces

Faculty Mentor: Chrystel Oloukoï, Geography Assistant Professor

Graduate Research Assistant: Althea Rao

Undergraduate Fellows: Edwin Bai, Sofia Geherin, Sophie Alexandra Cooper, and Sydney Astillero

Dream Palaces — Black Cinemas Spaces is a collaborative research project looking at Black cinema spaces across Africa and the diaspora as sites of cinematic innovation, cultural autonomy and community organizing. For the purpose of the 2025 Humanities Data Science Summer Institute, interns analyze Black independent cinemas in the United States specifically, but the geographical scope of the Dream Palaces project is broader. Parsing out and mapping the locations of Black cinema spaces from digitized Black newspapers such as The Washington Tribune (1921-1946) or The Detroit Tribune (1935-1966), we ask: what does the location of these Black cinema spaces tell us about Black urban life in the United States? 

AI Experimentation with the Pepe the Frog Meme

Faculty Mentor: Adair Rounthwaite, Art History Professor

Graduate Research Assistant: Nikoloz Nadirashvili

Undergraduate Fellows: Noor Fatima Hasan, Trisha Agrawal, Yuanxi Li, and Stephanie Nguyen

Our group’s work centers on the question: how good are LLMs at interpreting politically sensitive social media material, and specifically posts which contain both images and text? This summer’s work builds on a dataset of ~3500 tweets pertaining to the Pepe the Frog meme that our RA hand-culled and tagged to establish the presence or absence of hatred content, political references, and political positionality. During HDSSI, we query a range of AI models to see their strength in reading this material vis-à-vis its interpretation by two human coders. Our technical challenges involve prepping our dataset to query the models, using models hosted both locally on Hyak and in the cloud by AWS, and engineering our prompt. Our methodological and philosophical challenges include discussions about politically sensitive material, about the ethics of using this data for research, and about how to represent intercoder reliability in our results.

The 2025 Humanities Data Science Summer Institute may be done, but these projects are far from over. Program participants are planning on presenting at the UW Undergraduate Research Symposium. Additionally, a paper stemming from The Afterlives of Postwar American Authors was recently accepted to the Computational Humanities Research Conference. Congratulations to all involved with another successful HDSSI and these ongoing research projects!