SSEC Workshop: Generative AI Copilot for Scientific Software – a RAG-Based Approach

SSEC Workshop: Generative AI Copilot for Scientific Software – a RAG-Based Approach

When

06/18/2024    
12:30 pm – 4:30 pm

Where

Kincaid Hall, room 102/108
University of Washington, Seattle

The Scientific Software Engineering Center (SSEC) will be hosting a beta demo of their hands on tutorial: Generative AI Copilot for Scientific Software – a RAG-Based Approach, teaching attendees how to leverage open language models for scientific exploration with diverse input data, both public and private.

Generative AI systems built upon large language models (LLMs) have shown great promise as tools that enable people to access information through natural conversation. Scientists can benefit from the breakthroughs these systems enable to create advanced tools that will help accelerate their research outcomes. This tutorial will cover: (1) the basics of language models, (2) setting up the environment for using open source LLMs without the use of expensive compute resources needed for training or fine-tuning, (3) learning a technique like Retrieval-Augmented Generation (RAG) to optimize output of LLM, and (4) build a “production-ready” app to demonstrate how researchers could turn disparate knowledge bases into special purpose AI-powered tools. The right audience for our tutorial is scientists and research engineers who want to use LLMs for their work.

The language model used in the tutorial is the Allen Institute for AI (AI2) Open Language Model (OLMo), an LLM with open data, code, weights, and evaluation benchmarks. OLMo is purpose-built for scientific discovery as it has been trained on Dolma, an open dataset of 3 trillion tokens collected from diverse web content, academic publications, code, books, and encyclopedic materials. LangChain is a Python and JavaScript framework for developing applications powered by LLMs. Using LangChain, we’ll create a context-aware question answering agent by implementing a RAG chain. Using a simple example from the astronomy community, we demonstrate how the tool performs correctly with and incorrectly without RAG-enabled context. At the end of the tutorial, attendees will create an AI-powered question and answering application that they can use to advance their research.

This is a hands-on tutorial and attendees are expected to bring their own laptop. If you have data that you’d like to bring, please limit it to 500 MB of data.

Location: Kincaid Hall 102 + 108, UW Seattle