Partners: Sebastian Musslick and Younes Strittmatter
SSEC Engineer: Carlos Garcia Jurado Suarez
Research Goals and Domain
Reproducibility is a foundational pillar of the scientific process. However, numerous empirical studies in behavioral research are difficult to replicate due to inadequate and opaque documentation of their research steps. AutoRA is a collection of Python packages that together form a framework for closed loop empirical research. The packages allow users to set variables, weights, and actions to perform closed-loop empirical research studies.
Software Problem
After experiments are outlined by packages such as AutoRA, there is an opportunity to further automate the process, converting the steps written as code into academic descriptions in natural language.. create a pipeline and train a freely available large language model (LLM).
Software Solution
Autodoc is a Python library that contains inference CLI to generate a documentation draft from an input python code file. It also builds out the training and fine-tuning pipeline for the Llama-2-7b-chat LLM. Autodoc elucidates crucial steps of the research process in an automated fashion, driving great accessibility and reproducibility.
Impact
This software is a translator tool that allows users to turn their entire research code, expressed across multiple code files and in terms of scientific computing packages such as AutoRA into an automatically generated methods description describing the research process. Researchers are also able to upload their generated documentation to the Open Science Framework for everyone to review. The fine-tuned LLM model is now publicly accessible; empowering anyone to leverage its capabilities.