Long Story Short: Omitted Variable Bias in Causal Machine Learning
- URL: http://arxiv.org/abs/2112.13398v5
- Date: Sun, 26 May 2024 18:43:02 GMT
- Title: Long Story Short: Omitted Variable Bias in Causal Machine Learning
- Authors: Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, Vasilis Syrgkanis,
- Abstract summary: We develop a theory of omitted variable bias for a wide range of common causal parameters.
We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias.
We provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation.
- Score: 26.60315380737132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias, thus facilitating sensitivity analysis in otherwise complex, nonlinear models. Finally, we provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation. These results allow empirical researchers to perform sensitivity analyses in a flexible class of machine-learned causal models using very simple, and interpretable, tools. We demonstrate the utility of our approach with two empirical examples.
Related papers
- An Effective Theory of Bias Amplification [18.648588509429167]
Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups.
We propose a precise analytical theory in the context of ridge regression, where the former models neural networks in a simplified regime.
Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias.
arXiv Detail & Related papers (2024-10-07T08:43:22Z) - Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning [6.278498348219108]
We revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom.
We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments.
arXiv Detail & Related papers (2024-10-02T06:09:57Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A Theoretical Study of The Effects of Adversarial Attacks on Sparse
Regression [28.776950569604026]
We use the primal-dual witness paradigm to provide provable performance guarantees for the support of the estimated regression parameter vector.
Our theoretical analysis shows the counter-intuitive result that an adversary can influence sample complexity by corrupting the irrelevant features.
arXiv Detail & Related papers (2022-12-21T17:30:17Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Bias-inducing geometries: an exactly solvable data model with fairness
implications [13.690313475721094]
We introduce an exactly solvable high-dimensional model of data imbalance.
We analytically unpack the typical properties of learning models trained in this synthetic framework.
We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z) - The Variational Method of Moments [65.91730154730905]
conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables.
Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem.
We provide algorithms for valid statistical inference based on the same kind of variational reformulations.
arXiv Detail & Related papers (2020-12-17T07:21:06Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.