The Implicit Bias of Heterogeneity towards Invariance and Causality
- URL: http://arxiv.org/abs/2403.01420v1
- Date: Sun, 3 Mar 2024 07:38:24 GMT
- Title: The Implicit Bias of Heterogeneity towards Invariance and Causality
- Authors: Yang Xu, Yihong Gu, Cong Fang
- Abstract summary: It is observed that the large language models (LLM) trained with a variant of regression loss can unveil causal associations to some extent.
This is contrary to the traditional wisdom that association is not causation'' and the paradigm of traditional causal inference.
In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the source data.
- Score: 10.734620509375144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is observed empirically that the large language models (LLM), trained with
a variant of regression loss using numerous corpus from the Internet, can
unveil causal associations to some extent. This is contrary to the traditional
wisdom that ``association is not causation'' and the paradigm of traditional
causal inference in which prior causal knowledge should be carefully
incorporated into the design of methods. It is a mystery why causality, in a
higher layer of understanding, can emerge from the regression task that pursues
associations. In this paper, we claim the emergence of causality from
association-oriented training can be attributed to the coupling effects from
the heterogeneity of the source data, stochasticity of training algorithms, and
over-parameterization of the learning models. We illustrate such an intuition
using a simple but insightful model that learns invariance, a quasi-causality,
using regression loss. To be specific, we consider multi-environment low-rank
matrix sensing problems where the unknown r-rank ground-truth d*d matrices
diverge across the environments but contain a lower-rank invariant, causal
part. In this case, running pooled gradient descent will result in biased
solutions that only learn associations in general. We show that running
large-batch Stochastic Gradient Descent, whose each batch being linear
measurement samples randomly selected from a certain environment, can
successfully drive the solution towards the invariant, causal solution under
certain conditions. This step is related to the relatively strong heterogeneity
of the environments, the large step size and noises in the optimization
algorithm, and the over-parameterization of the model. In summary, we unveil
another implicit bias that is a result of the symbiosis between the
heterogeneity of data and modern algorithms, which is, to the best of our
knowledge, first in the literature.
Related papers
- Raising the Bar in Graph OOD Generalization: Invariant Learning Beyond Explicit Environment Modeling [58.15601237755505]
Real-world graph data often exhibit diverse and shifting environments that traditional models fail to generalize across.
We propose a novel method termed Multi-Prototype Hyperspherical Invariant Learning (MPHIL)
MPHIL achieves state-of-the-art performance, significantly outperforming existing methods across graph data from various domains and with different distribution shifts.
arXiv Detail & Related papers (2025-02-15T07:40:14Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Counterfactual Fairness through Transforming Data Orthogonal to Bias [7.109458605736819]
We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB)
OB is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications.
OB is model-agnostic, making it applicable to a wide range of machine learning models and tasks.
arXiv Detail & Related papers (2024-03-26T16:40:08Z) - Effective Causal Discovery under Identifiable Heteroscedastic Noise Model [45.98718860540588]
Causal DAG learning has recently achieved promising performance in terms of both accuracy and efficiency.
We propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations.
We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties.
arXiv Detail & Related papers (2023-12-20T08:51:58Z) - Unleashing the Power of Graph Data Augmentation on Covariate
Distribution Shift [50.98086766507025]
We propose a simple-yet-effective data augmentation strategy, Adversarial Invariant Augmentation (AIA)
AIA aims to extrapolate and generate new environments, while concurrently preserving the original stable features during the augmentation process.
arXiv Detail & Related papers (2022-11-05T07:55:55Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Differentiable Invariant Causal Discovery [106.87950048845308]
Learning causal structure from observational data is a fundamental challenge in machine learning.
This paper proposes Differentiable Invariant Causal Discovery (DICD) to avoid learning spurious edges and wrong causal directions.
Extensive experiments on synthetic and real-world datasets verify that DICD outperforms state-of-the-art causal discovery methods up to 36% in SHD.
arXiv Detail & Related papers (2022-05-31T09:29:07Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - Kernelized Heterogeneous Risk Minimization [25.5458915855661]
We propose a Kernelized Heterogeneous Risk Minimization (KerHRM) algorithm, which achieves both the latent exploration and invariant learning in kernel space.
We theoretically justify our algorithm and empirically validate the effectiveness of our algorithm with extensive experiments.
arXiv Detail & Related papers (2021-10-24T12:26:50Z) - Heterogeneous Risk Minimization [25.5458915855661]
Invariant learning methods for out-of-distribution generalization have been proposed by leveraging multiple training environments to find invariant relationships.
Modern datasets are assembled by merging data from multiple sources without explicit source labels.
We propose Heterogeneous Risk Minimization (HRM) framework to achieve joint learning of latent heterogeneity among the data and invariant relationship.
arXiv Detail & Related papers (2021-05-09T02:51:36Z) - Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.