SeaFlow, a research instrument developed in the lab of UW School of Oceanography director Ginger Armbrust, analyzes 15,000 marine microorganisms per second, generating up to 15 gigabytes of data every single day of a typical multi-week-long oceanographic research cruise.
UW professor of astronomy Andy Connolly is preparing for the unveiling of the Large Synoptic Survey Telescope (LSST), which will map the entire night sky every three days and produce about 100 petabytes of raw data about our universe over the course of 10 years. (One petabyte of music in MP3 format would take 2,000 years to play.)
What scientists like Armbrust and Connolly have is popularly known as “big data,” and as rich and exciting as it can be, big data can also be a big problem.
“Every field of discovery is transitioning from data-poor to data-rich, and the people doing the research don’t have the wherewithal to cope with this data deluge,” says Ed Lazowska, director of the UW’s eScience Institute.
“And now, the eScience team – the core team includes faculty from 12 departments representing five schools and colleges – is poised to scale way up. Last year, the UW won a five-year, $37.8 million grant from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation that will be shared with New York University and the University of California, Berkeley, to foster a data science culture at the three universities.
“We don’t want this to be a magic trick that only computer scientists know how to do,’ [eScience Institute Associate Director Bill] Howe says. ‘It should be something that everybody can do.”