Ulric B. and Evelyn L. Bray Social Sciences Seminar
Abstract: Multivariate regression has been the go-to method for data analysis for generations of scholars. While transparent and interpretable, with desirable theoretical properties, the method's simplicity precludes the discovery of complex heterogeneities in the data. We introduce a method that embraces these potential complexities, is interpretable, has desirable theoretical guarantees, and is tailored to causal effect estimation. The proposed method uses a machine learning regression methodology to estimate the observation-level effect of a treatment variable, for either a binary, categorical, or continuous treatment. We illustrate with an instrumental variable analysis, estimating interesting heterogeneities in both the first and second stage. We use our theoretical results to estimate observations for which the treatment is not impacted by the instrument, for which no causal effect is identified. We provide multiple pedagogic introductions to new concepts from the machine learning literature. A technical appendix and extensive simulation evidence establishes the method's utility and use.