Adventures in little data

Jan. 18, 2017 from 3:30 to 4:20 — Johnson Hall, room 102

Paul Ginsparg

Professor of Physics and Information Science, Cornell University


I will give a very brief sociological overview of the current metastable state of scholarly research communication, and then a technical discussion of the practical implications of literature and usage data considered as computable objects, using arXiv as exemplar. From the physics standpoint, there is a surprising amount of statistical mechanics in text-mining and machine learning.


Paul Ginsparg has a been professor of physics and information science at Cornell University since 2001. He received a B.A. in physics from Harvard University (1977), and a doctorate in theoretical particle physics from Cornell University (1981). He was in the Society of Fellows at Harvard from 1981 – 1984, then a faculty member in the physics department at Harvard University until 1990, and a staff member in the theoretical division of Los Alamos National Laboratory from 1990 to 2001. He has authored papers in quantum field theory, string theory, conformal field theory, and quantum gravity. While visiting Aspen in the summer of 1991, he started the e-print archives (now He has served on many committees, including the U.S. National Committee for CODATA, other N.R.C., N.A.S., and AAAS committees, the NIH PubMedCentral national advisory board, on the American Physical Society publications oversight committee, and the Public Library of Science advisory board. He has received awards including the P.A.M. (physics astronomy math) award from the Special Libraries Association, the Council of Science Editors (CSE) Award for Meritorious Achievement, the Paul Evans Peters Award from Educause, ARL, and CNI; was elected as a Fellow of the American Physical Society; and has been named a MacArthur Fellow, a Radcliffe Institute Fellow, a “White House Champion of Change”, and a Simons Fellow in Theoretical Physics.