AI Hub – eScience Institute

Picture of a panel discussion at the Seattle AI week event.

Events

This fall, SSEC hosted a community-driven meetup during Seattle AI Week focused on how AI is reshaping research workflows in academia, industry, and startups. The meetup included an “Agentic AI for Research Workflows” panel moderated by SSEC Head of Engineering Vani Mandava and featuring Bodhisattwa Majumder from AI2, Shamsi Iqbal from Microsoft, Luke Kim from Spice AI and Carlos Garcia Jurado Suarez from SSEC. Around sixty audience members listened as panelists discussed the exciting potential and challenges surrounding the application of artificial intelligence to research.

NAIRR Award

SSEC won a National AI Research Resource (NAIRR) Award to build a tool library, named LLMaven, using a Generative AI approach. We will use RAG (Retrieval Augmented Generation) techniques as a means of extending LLMs by utilizing data that has privacy concerns in a manner that is safe and cost effective for individual researchers who do not have the resources to develop their own models (or purchase expensive equipment). LLMaven will leverage publicly available diverse datasets and disparate academic knowledge bases.

RAG Office Hours

As part of eScience Institute’s Office Hours program, SSEC is offering office hours every Tuesday from 10 AM – 11 AM at eScience Institute’s Data Science Studio on UW campus to help support the UW community on issues related to RAG (Retrieval-Augmented Generation) based workflows for Generative AI. Researchers who are curious about leveraging generative AI tools with private or pre-publication data are welcome to sign up here and stop by with their questions.

Conceptual graphic showing a subsection of a Retrieval Augmented Generation system.

Picture of someone writing equations on a computer tablet.

Projects

AutoDoc: SSEC worked with researchers from Brown University and University of Osnabruck to build a pipeline and train a freely available large language model (LLM) to translate research processes implemented in AutoRA (a collection of Python packages that together form a framework for closed loop empirical research). Such descriptions provide the basis for an automated and transparent documentation of the empirical research process. More details are available here.

Tutorials

SciPy2024 tutorial: The SSEC team presented a tutorial at the annual SciPy conference in Tacoma, WA on Jul 09 2024 to cover (1) the basics of language models, (2) setting up the environment for using open source LLMs without the use of expensive compute resources needed for training or fine-tuning, (3) learning a technique like Retrieval-Augmented Generation (RAG) to optimize output of LLM, and (4) build an app to demonstrate how researchers could turn disparate knowledge bases into special purpose AI-powered tools. 

Picture of someone delivering a lecture to a group of people.

Models

While experimenting with the limits of LLM inferencing for science use cases, SSEC worked on building useful applications for science utilizing open data, as well as open models. They started with Ai2’s OLMO models built on open datasets with published checkpoints. However, locally CPU run inference was taking too much time, so the next logical step was to speed up the model using the llamacpp approach. This method implied an intermediate step to convert the model to GGUF format. Upon posting the model to Hugging Face, the team saw the downloads for the model reach over 1.2k downloads after just one day. Now, there are over 6K downloads to date for the two models.

Workshops

2025 Schmidt Sciences Interdisciplinary Science Summit: SSEC offered an interactive session to over 90 researchers at the Schmidt Sciences Interdisciplinary Science Summit to help shape a grounded, practical, and scientist-centered vision for AI in scientific discovery. We used a participatory design approach grounded in real research challenges. Then we explored how generative models and agentic AI might (or might not) enhance the processes, problem-solving, and insights that drive science.

Picture of the some of the UW eScience Software Engineering team.