Date/Time

Date(s) - 01/20/2022
4:30 pm - 5:30 pm

Please use this zoom link for the event.

Please join us for a UW Data Science Seminar event on Thursday, January 20th from 4:30 to 5:30 p.m. The seminar will feature data scientists Megha Subramanian and Alejandro Zuniga from the Pacific Northwest National Laboratory (PNNL).

“Artificial Judgement Assistance from teXt (AJAX): Applying Open Domain Question Answering to Nuclear Non-proliferation Analysis”

Abstract: Nuclear non-proliferation analysis is complex and subjective, as the data is sparse, and examples are rare and diverse. While analysing non-proliferation data, it is often desired that the findings be completely auditable such that any claim or assertion can be sourced directly to the reference material from which it was derived. Currently this is accomplished by analysts thoroughly documenting underlying assumptions and clearly referencing details to source documents. This is a labour-intensive and time-consuming process that can be difficult to scale with geometrically increasing quantities of data. In this talk, we describe an approach to leverage bi-directional language models for nuclear non-proliferation analysis. It has been shown recently that these models not only capture language syntax but also some of the relational knowledge present in the training data. We have devised a unique Salt and Pepper strategy for testing the knowledge present in the language models, while also introducing auditability function in our pipeline. We demonstrate that fine-tuning the bi-directional language models on domain specific corpus improves their ability to answer domain-specific factoid questions. Our hope is that the results presented in this paper will further the natural language processing (NLP) field by introducing the ability to audit the answers provided by the language models to bring forward the source of said knowledge.

Biographies: Megha Subramanian is currently a data scientist at Pacific Northwest National Laboratory (PNNL). She holds a Master’s degree in Electrical Engineering from RWTH Aachen University in Germany. Prior to joining PNNL, she worked as an R&D Engineer at Sivantos GmbH (formerly known as Siemens Audiology Solutions), a hearing aid manufacturing company in Germany. At PNNL she splits her time between projects involving core natural language processing research and those that involve signal processing, sensor data analysis and firmware development related tasks. Some of her research interests in the NLP domain include open-domain and domain specific question answering as well as text generation.

Alejandro Zuniga is a data scientist at Pacific Northwest National Laboratory (PNNL). At the moment, Alejandro is spending much of his time working within the natural language processing (NLP) domain, with work including knowledge graph developments, NLP social media analysis, and machine explainability. Prior to working at PNNL, Alejandro studied at Florida State University, where he obtained both a Bachelor’s of Science in statistics, with a minor in mathematics, and a Master’s of Science in statistical data science. Alejandro’s research interests include machine learning, deep learning, and NLP.

The UW Data Science Seminar is an annual lecture series at the University of Washington that hosts scholars working across applied areas of data science, such as the sciences, engineering, humanities and arts along with methodological areas in data science, such as computer science, applied math and statistics. Our presenters come from all domain fields and include occasional external speakers from regional partners, governmental agencies and industry.

The 2021-2022 seminars will be hybrid virtual and in-person events, and are free and open to the public.