Project lead: Gregoire Lurton, UW Institute for Health Metrics and Evaluation
Advisors: Abie Flaxman and Emmanuela Gakidou, UW Institute for Health Metrics
eScience Liaison: Daniel Halperin, Director of Research – Scalable Analytics, UW eScience Institute
Every year, millions of dollars are spend on collecting data on health services in developing countries. This data then typically sits unused because of data access, reliability, and management issues. During this project, we worked with a set of over 5000 monthly reports collected from 2008 to 2012 by the Kenya Health Ministry. These reports are part of the Kenyan Health Management Information System (HMIS), through which hospitals report on a regular basis on the main pathologies they had to treat and the different activities they carried out. This dataset has been collated manually and collected in a diversity of Excel files, which makes it difficult to process and analyse. As a result this type of routine data is seldom used for policy making or health system management.
Our aim was to make this data easily usable for data analysis. We developed a series of methods to 1) programmatically extract the data from Excel in order to automate access to thousands of spreadsheets while handling the quirks of manually-entered Excel data from a variety of report templates, 2) test the reliability of the data using a variety of new spreadsheet and data features, and 3) import the data into SQLShare in order to provide querying capabilities over the spreadsheet data. to SQL, using Excel files metadata to cluster and classify the data.