# CMI Seminar

Consider a system consisting of a binary "stimulus" variable X, (e.g. whether an individual consumes tobacco), a binary "response" variable Y, (e.g. whether the individual develops a particular disease) and a "hidden" confounding variable U, (e.g. a hypothetical genetic factor that affects the individual's propensity to both consume tobacco as well as to develop the disease). The system can be modeled by a directed graphical model with X, Y and U as nodes, and U ? X, U ? Y and X ? Y as edges. Given P, the observed probability distribution of the pair (X, Y), we ask: what is the causal effect of X on Y? In the early 1990s, Pearl proposed that an appropriate formalization of the above question is to compute the conditional probability of Y given X in a different graphical model in which the incoming edges to X have been deleted, given only the observed distribution P on the original model. (In the above example, this new model will only have the edges X ? Y and U ? Y.) This deletion simulates an experimental intervention on X while not making any changes to the other interactions in the model. In the general case, the graphical model can comprise several other nodes, and X and Y themselves might be disjoint subsets of these nodes instead of being singleton vertices.

It is not hard to see that the above problem is not solvable for all graphical models and all pairs X and Y of subsets of nodes. However, a long of line of work by several researchers culminated in 2006 in a complete algorithmic characterization of graphical models in which the problem is solvable. Further, this characterization also included an algorithmic procedure which takes the observed distribution and outputs the requisite conditional distribution in those cases where the problem is solvable [Huang and Valtorta, 2006 and Shpitser and Pearl, 2006].

These characterizations assumed an exact knowledge of both the underlying graphical model and the observed distribution P. However, in practice, we may expect imprecision both in our description of the model (which may arise from failing to include an edge that should be present), and in our knowledge of the observed distribution P (which may arise due to round-off errors in storage and due to statistical noise). In our work, we study the stability of the above causal inference procedures with respect to such imprecisions in the input.

On the positive side, we show that the effect of removing an edge on the accuracy of these procedures is not large, as long as the removed edges are themselves "weak"; in particular, the error introduced in the output is proportional to the "strength" of the edge for a natural measure of strength. We also show that examples solvable by some sound but "incomplete" algorithms that preceded the complete algorithm described above are well-conditioned with respect to errors in the observed distribution P.

On the negative side, we construct examples solvable by the complete algorithm described above that are remarkably ill-conditioned. More formally, we give a sequence of instances such that for any algorithm that solves the causal inference problem on an instance with n nodes, the relative error in the output is larger than the relative error in the input by a factor that is at least O(exp(n^{1/3})).

Joint work with Leonard J. Schulman.