Methods: Machine Learning,Visualization
Fields: Economics, Informatics, Social Science

Collaborators: Joshua Blumenstock (UW CSE), Gabriel Cadamuro (UW CSE), Robert On (U.C. Berkeley)

Accurate and timely estimates of population characteristics are a critical input to social and economic research and policy. In industrialized economies, novel sources of passively-collected data are enabling new approaches to population modeling and measurement. In developing countries, however, fewer sources of such “big data” exist. The notable exception is the mobile phone, which now has roughly 90% global penetration. In this paper, we show that an individual’s past history of phone use can be used to accurately infer his or her socioeconomic status, as well as a broad range of other demographic characteristics. Our approach uses a small number of phone surveys to establish ground truth within a population of millions of anonymized subscribers. The fitted model can then be used to generate predictions for the entire population, construct geographic aggregates that correspond very closely to official government statistics (r = 0.93), or infer the characteristics of micro-regions that are much smaller than the most fine-grained administrative units of the country. In resource-constrained environments where censuses and household surveys are rare, this creates an option for gathering timely information on population statistics at a tiny fraction of the cost of traditional methods.

Predicting Poverty and Wealth