Center for Social Information Sciences (CSIS) Seminar

Friday, February 28, 2020

12:00pm to 1:00pm

Baxter B125

Multiple Imputation for Large Multiscale Data with Linear Constraints

Jian Cao, Postdoctoral Scholar in Data Science and Election Integrity, Division of the Humanities and Social Sciences, Caltech,

Abstract: We present a new method that is capable of handling both missing and suppressed value problems for large multiscale data sets, such as the Quarterly Census of Employment and Wages (QCEW) from the U.S. Bureau of Labor Statistics. Existing multiple imputation methods are hard to scale for such data sets. This particularly acute in the case of QCEW, with as many as 1.5 billion observations aggregated along three different scales (industry structure, geographic levels, and time). Our method incorporates three innovations. First, we improve the accuracy of the Bootstrapping-based Expectation Maximization method (King et al. 2010), a state-of-the-art multiple imputation method, by utilizing the extra information from the singular covariance matrix and taking into account of the multiscale data structure. Second, we introduce a quasi-Monte Carlo technique to accelerate convergence. Third, we develop a parallel sequential approach that partitions the large data set into quasi-independent small data sets according to the data structure and patterns of suppressed and missing observations. We demonstrate that our new method improves speed and accuracy. Moreover, it can be applied to large data sets with complicated multiscale structures.

For more information, please contact Mary Martin by phone at 626-395-4571 or by email at [email protected].

Event Series

Center for Social Information Sciences (CSIS) Seminar Series

Event Sponsors

Computing and Mathematical Sciences (CMS) More Events from this Sponsor

Division of the Humanities and Social Sciences More Events from this Sponsor

Ronald and Maxine Linde Institute of Economic and Management Sciences More Events from this Sponsor