Methods: Software
Fields: Computer Science

Joseph L. Hellerstein (eScience Institute, Computer Science & Engineering)

A graphic of a spaghetti spreadsheet with a red x through it

“Spaghetti” spreadsheet – an undesirable result of current spreadsheet systems

Digital spreadsheets are arguably the most pervasive environment for data analysis on the planet. This is largely because spreadsheets provide a conceptually simple way to do calculations that (a) closely associates data with the calculations that produce the data and (b) avoids the mental burdens of programming such as control flow, data dependencies, and data structures.

Unfortunately, spreadsheets have notorious failings as well. They lack expressivity in that (a) spreadsheets only permit a limited set of functions to be used in formulas (e.g., so that static dependency checking can be done); and (b) they only support formulas that are expressions, not scripts, and so calculations cannot be expressed as algorithms. Second, it is impossible to reuse spreadsheet formulas in other spreadsheet formulas or in software systems. Third, spreadsheet systems cannot handle complex data, such as manipulating data that are hierarchically structured or data that have n-to-m relationships. Finally, spreadsheets scale poorly with the size of data and the number of formulas.

SciSheets (from “scientific spreadsheets”) is an open source project that provides novel features to address the foregoing requirements: (1) formulas can be arbitrary Python scripts as well as expressions (formula scripts), which addresses expressivity by allowing calculations to be written as algorithms; (2) spreadsheets can be exported as functions in a Python module (function export), which addresses reuse since exported codes can be reused in formulas and/or by external programs and improves performance since calculations can execute in a low overhead environment; and (3) tables can have columns that are themselves tables (subtables), which addresses complex data such as representing hierarchically structured data and n-to-m relationships.

SciSheets is an open source project on GitHub (https://github.com/ScienceStacks/SciSheets), and is described in detail in a 2017 SciPy paper (http://conference.scipy.org/proceedings/scipy2017/pdfs/joseph_hellerstein.pdf). At present, SciSheets can do robust demos, but it is not yet beta code.