eScience Data Science Fellow Ben Marwick, who is also Associate Professor of Archaeology at University of Washington, has an article published today in The Conversation about the problem that computers pose to reproducibility in science.

“For most of the history of science, researchers have reported their methods in a way that enabled independent reproduction of their results. But, since the introduction of the personal computer – and the point-and-click software programs that have evolved to make it more user-friendly – reproducibility of much research has become questionable, if not impossible. […] Preparing data, analyzing it, visualizing the results – these are tasks done on the computer, in private. […] How, then, can another researcher judge the reliability of the results, or reproduce the analysis?”

It’s an interesting problem and one that the eScience Institute, through our Reproducibility and Open Science Working Group, sees as an important challenge as data-intensive scientific discovery becomes more common.

Marwick’s recommendation? Open formats, free software, and other tools and methods that “make it easier to keep track of files and analyses done on computers.”

“All those private files on our personal computers and the private analysis tasks we do as we work toward preparing for publication should be made public along with the journal article,” writes Marwick. “Currently, these are the tools and methods of the avant-garde, and many midcareer and senior researchers have only a vague awareness of them. But many undergraduates are learning them now. Many graduate students, seeing personal advantages to getting organized, using open formats, free software and streamlined collaboration, are seeking out training and tools from volunteer organizations to fill the gaps in their formal training.”

You can read Marwick’s full article in The Conversation here.

Example of a script used to analyze data.

Example of a script used to analyze data.