The Implicit Bias of Heterogeneity towards Invariance and Causality
- URL: http://arxiv.org/abs/2403.01420v1
- Date: Sun, 3 Mar 2024 07:38:24 GMT
- Title: The Implicit Bias of Heterogeneity towards Invariance and Causality
- Authors: Yang Xu, Yihong Gu, Cong Fang
- Abstract summary: It is observed that the large language models (LLM) trained with a variant of regression loss can unveil causal associations to some extent.
This is contrary to the traditional wisdom that association is not causation'' and the paradigm of traditional causal inference.
In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the source data.
- Score: 10.734620509375144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is observed empirically that the large language models (LLM), trained with
a variant of regression loss using numerous corpus from the Internet, can
unveil causal associations to some extent. This is contrary to the traditional
wisdom that ``association is not causation'' and the paradigm of traditional
causal inference in which prior causal knowledge should be carefully
incorporated into the design of methods. It is a mystery why causality, in a
higher layer of understanding, can emerge from the regression task that pursues
associations. In this paper, we claim the emergence of causality from
association-oriented training can be attributed to the coupling effects from
the heterogeneity of the source data, stochasticity of training algorithms, and
over-parameterization of the learning models. We illustrate such an intuition
using a simple but insightful model that learns invariance, a quasi-causality,
using regression loss. To be specific, we consider multi-environment low-rank
matrix sensing problems where the unknown r-rank ground-truth d*d matrices
diverge across the environments but contain a lower-rank invariant, causal
part. In this case, running pooled gradient descent will result in biased
solutions that only learn associations in general. We show that running
large-batch Stochastic Gradient Descent, whose each batch being linear
measurement samples randomly selected from a certain environment, can
successfully drive the solution towards the invariant, causal solution under
certain conditions. This step is related to the relatively strong heterogeneity
of the environments, the large step size and noises in the optimization
algorithm, and the over-parameterization of the model. In summary, we unveil
another implicit bias that is a result of the symbiosis between the
heterogeneity of data and modern algorithms, which is, to the best of our
knowledge, first in the literature.
Related papers
- Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping [0.8901073744693314]
A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables.
This paper proposes mechanism learning, a simple method which uses front-door causal bootstrapping to deconfound observational data.
We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors.
arXiv Detail & Related papers (2024-10-26T03:34:55Z) - Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables.
Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets.
Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Systematic Evaluation of Causal Discovery in Visual Model Based
Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure.
Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs.
In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z) - Causal Discovery in Knowledge Graphs by Exploiting Asymmetric Properties
of Non-Gaussian Distributions [3.1981440103815717]
We define a hybrid approach that allows us to discover cause-effect relationships in Knowledge Graphs.
The proposed approach is based around the finding of the instantaneous causal structure of a non-experimental matrix using a non-Gaussian model.
We use two different pre-existing algorithms, one for the causal discovery and the other for decomposing the Knowledge Graph.
arXiv Detail & Related papers (2021-06-02T09:33:05Z) - Disentangling Observed Causal Effects from Latent Confounders using
Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions.
We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z) - A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all.
We propose a new adversarial training method that mimics the disentangled structure of the causal model.
Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.