KELLER Colloquium in Computing & Mathematical Sciences
Abstract: An enormous number of useful statistical tasks can be carried out subject to the protections of differential privacy, including both learning and the generation of synthetic data sets that are useful for a wide variety of analyses. Unfortunately, most of these problems are computationally hard in the worst case. The scenario is reminiscent of machine learning, since most learning tasks are hard in the worst case. Worst case hardness has not stopped the practice of machine learning, however --- a variety of practical heuristics reliably solve hard machine learning problems routinely. But machine learning has an advantage in the use of heuristics: We can just "try out" heuristics and evaluate their accuracy. In contrast, differential privacy cannot be verified empirically and must be reasoned about analytically.
In this work, we start to develop a theory of oracle-efficient differentially private computation. We show how to solve a number of learning and synthetic data generation tasks, under the assumption that we have an "oracle" that can solve the non-private version of a corresponding learning problem. We give a simple oracle efficient learning algorithm, whose privacy guarantees depend on the correctness of the oracle. We then give a generic reduction under which we can take any algorithm that is provably private under the assumption that it has access to a perfect oracle, and arrive at an algorithm which is private even if the oracle suffers adversarially chosen failures. Our results hold only for a restricted class of learning and synthetic data generation problems: we also give a barrier result that presents an obstacle to giving oracle efficient algorithms for all private learning tasks.
Joint work with Seth Neel and Steven Wu.