Making data science training resources FAIR
John Darrell Van Horn, M.Eng., Ph.D., the USC Mark and Mary Stevens Neuroimaging and Informatics Institute and the Laboratory of Neuro Imaging

Apr. 11, 2018, 3:30 p.m., Physics/Astronomy Auditorium (PAA), A102

[Watch a recording of this seminar on YouTube.]


In our rapidly evolving information era, methods for handling large quantities of data obtained in biomedical research have emerged as powerful tools for confronting critical research questions. These methods are having significant impacts in diverse domains ranging from genomics, to health informatics, to environmental research, and beyond. The NIH’s Big Data to Knowledge (BD2K) Training Consortium, in particular, has worked to empower current and future generations of researchers with a comprehensive understanding of the data science ecosystem, giving them the ability to explore, prepare, analyze, visualize, and interpret Big Data.

To this end, the BD2K Training Coordinating Center (TCC) was funded to facilitate in-person and online learning, and to open the concepts of data science to the widest possible audience. In this presentation, I will describe the activities of the BD2K TCC, particularly the construction of the Educational Resource Discovery Index (ERuDIte). ERuDIte identifies, collects, describes, and organizes over 10,000 data science training resources, including: online data science materials from BD2K awardees; open online courses; and videos from scientific lectures and tutorials. Given the richness of online training materials and the constant evolution of biomedical data science, computational methods applying information retrieval, natural language processing, and machine learning techniques are required.

In effect, data science is being used to inform training in data science where the so-called FAIR principles apply equally to these resources as well as to the datatypes and methods they describe. As a result, the work of the TCC has aimed to democratize novel insights and discoveries brought forth via large-scale data science training. This presentation will be of interest to anyone seeking to personalize their own data science education, craft unique online training curricula, and/or share their own online training content.


A photo of Jack Van HornDr. Van Horn is an associate professor of neurology with additional appointments in neuroscience and in electrical engineering at the University of Southern California in Los Angeles, California. He received his bachelor’s degree in psychology from Eastern Washington University in Cheney, WA, a masters in electrical engineering and computer science from the University of Maryland, College Park, and his PhD from the University of London in the United Kingdom.

He conducted a post-doctoral fellowship at the National Institute of Mental Health on the National Institutes of Health main campus in Bethesda, MD, specializing in human neuroimaging investigation of brain function. He has held faculty positions at Dartmouth College, the University of California (USC), Los Angeles, and now, USC. He is an accomplished author (over 150 publications, h-index>45), university-level educator, and is known internationally as an expert in neuroinformatics and data sharing.

Among his research articles are many publications on multimodal neuroimaging of the brain and the characterization of mild and severe traumatic brain injury (TBI). This includes using MRI and diffusion tensor imaging to model the morphological effects of brain injury as well as the effect on white matter fiber pathways. He is the past education chair as well as program chair for the Organization for Human Brain Mapping and current vice-president of the Society for Claustrum Research. He directs the masters of science in neuroimaging and informatics masters of science program at USC – a one-of-its-kind program covering the spectrum of human neuroimaging research and practice – as well as contributes to other USC graduate programs.

He is also the principal investigator of the National Institutes of Health Big Data to Knowledge (BD2K) Training Coordinating Center, an effort to synthesize data science educational content from around the internet, index it into a common database framework, and make the information searchable, sortable, and openly available for users to organize into personalized training plans. He directs a unique series of five-day mentored and facilitated Data Science Innovation Lab events on specific topics such as mobile health, the microbiome, and single cell dynamics.

Finally, Dr. Van Horn oversees a data science scholarly rotation program which pairs junior biomedical researchers with more senior data scientists to work on projects of mutual interest. He enjoys traveling, road cycling, mountaineering, is a private pilot, and lives in Los Angeles, CA, with his wife and two daughters.

This seminar will be co-sponsored by the UW Institute for Neuroengineering.

The UW Institute of Neuroengineering logo, which reads UWIN