Sample Complexity of Distinguishing Cause from Effect

Sourbh Bhadane, University of Amsterdam

We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables X and Y , we consider the classical scenario where either X causes Y , Y causes X, or there is an unmeasured confounder between X and Y . We show that if X and Y are over a finite domain of size k and are significantly correlated, the minimum number of interventional samples needed is sublinear in k. We give a tight characterization of the tradeoff between observational and interventional data when the number of observational samples is sufficiently large. We build upon techniques for closeness testing and for non-parametric density estimation in different regimes of observational data. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.

Recorded Talk