Can predictive models be used for causal inference?
- URL: http://arxiv.org/abs/2306.10551v1
- Date: Sun, 18 Jun 2023 13:11:36 GMT
- Title: Can predictive models be used for causal inference?
- Authors: Maximilian Pichler and Florian Hartig
- Abstract summary: Supervised machine learning (ML) and deep learning (DL) algorithms excel at predictive tasks.
It is commonly assumed that they often do so by exploiting non-causal correlations.
We show that this trade-off between explanation and prediction is not as deep and fundamental as expected.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised machine learning (ML) and deep learning (DL) algorithms excel at
predictive tasks, but it is commonly assumed that they often do so by
exploiting non-causal correlations, which may limit both interpretability and
generalizability. Here, we show that this trade-off between explanation and
prediction is not as deep and fundamental as expected. Whereas ML and DL
algorithms will indeed tend to use non-causal features for prediction when fed
indiscriminately with all data, it is possible to constrain the learning
process of any ML and DL algorithm by selecting features according to Pearl's
backdoor adjustment criterion. In such a situation, some algorithms, in
particular deep neural networks, can provide near unbiased effect estimates
under feature collinearity. Remaining biases are explained by the specific
algorithmic structures as well as hyperparameter choice. Consequently, optimal
hyperparameter settings are different when tuned for prediction or inference,
confirming the general expectation of a trade-off between prediction and
explanation. However, the effect of this trade-off is small compared to the
effect of a causally constrained feature selection. Thus, once the causal
relationship between the features is accounted for, the difference between
prediction and explanation may be much smaller than commonly assumed. We also
show that such causally constrained models generalize better to new data with
altered collinearity structures, suggesting generalization failure may often be
due to a lack of causal learning. Our results not only provide a perspective
for using ML for inference of (causal) effects but also help to improve the
generalizability of fitted ML and DL models to new data.
Related papers
- Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping [0.8901073744693314]
A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables.
This paper proposes mechanism learning, a simple method which uses front-door causal bootstrapping to deconfound observational data.
We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors.
arXiv Detail & Related papers (2024-10-26T03:34:55Z) - Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning [6.278498348219108]
We revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom.
We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments.
arXiv Detail & Related papers (2024-10-02T06:09:57Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - The Predictive Normalized Maximum Likelihood for Over-parameterized
Linear Regression with Norm Constraint: Regret and Double Descent [12.929639356256928]
We show that modern machine learning models do not obey a trade-off between the complexity of a prediction rule and its ability to generalize.
We use the recently proposed predictive normalized maximum likelihood (pNML) which is the min-max regret solution for individual data.
We demonstrate the use of the pNML regret as a point-wise learnability measure on synthetic data and that it can successfully predict the double-decent phenomenon.
arXiv Detail & Related papers (2021-02-14T15:49:04Z) - Benign overfitting in ridge regression [0.0]
We provide non-asymptotic generalization bounds for overparametrized ridge regression.
We identify when small or negative regularization is sufficient for obtaining small generalization error.
arXiv Detail & Related papers (2020-09-29T20:00:31Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - Predictive Complexity Priors [3.5547661483076998]
We propose a functional prior that is defined by comparing the model's predictions to those of a reference model.
Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables.
We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning.
arXiv Detail & Related papers (2020-06-18T18:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.