Nonlinear Invariant Risk Minimization: A Causal Approach
- URL: http://arxiv.org/abs/2102.12353v1
- Date: Wed, 24 Feb 2021 15:38:41 GMT
- Title: Nonlinear Invariant Risk Minimization: A Causal Approach
- Authors: Chaochao Lu, Yuhuai Wu, Jo\'se Miguel Hern\'andez-Lobato, Bernhard
Sch\"olkopf
- Abstract summary: We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
- Score: 5.63479133344366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to spurious correlations, machine learning systems often fail to
generalize to environments whose distributions differ from the ones used at
training time. Prior work addressing this, either explicitly or implicitly,
attempted to find a data representation that has an invariant causal
relationship with the target. This is done by leveraging a diverse set of
training environments to reduce the effect of spurious features and build an
invariant predictor. However, these methods have generalization guarantees only
when both data representation and classifiers come from a linear model class.
We propose Invariant Causal Representation Learning (ICRL), a learning paradigm
that enables out-of-distribution (OOD) generalization in the nonlinear setting
(i.e., nonlinear representations and nonlinear classifiers). It builds upon a
practical and general assumption: the prior over the data representation
factorizes when conditioning on the target and the environment. Based on this,
we show identifiability of the data representation up to very simple
transformations. We also prove that all direct causes of the target can be
fully discovered, which further enables us to obtain generalization guarantees
in the nonlinear setting. Extensive experiments on both synthetic and
real-world datasets show that our approach significantly outperforms a variety
of baseline methods. Finally, in the concluding discussion, we further explore
the aforementioned assumption and propose a general view, called the Agnostic
Hypothesis: there exist a set of hidden causal factors affecting both inputs
and outcomes. The Agnostic Hypothesis can provide a unifying view of machine
learning in terms of representation learning. More importantly, it can inspire
a new direction to explore the general theory for identifying hidden causal
factors, which is key to enabling the OOD generalization guarantees in machine
learning.
Related papers
- Demystifying amortized causal discovery with transformers [21.058343547918053]
Supervised learning approaches for causal discovery from observational data often achieve competitive performance.
In this work, we investigate CSIvA, a transformer-based model promising to train on synthetic data and transfer to real data.
We bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations.
arXiv Detail & Related papers (2024-05-27T08:17:49Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing [9.551225697705199]
This paper studies the implicit bias of Gradient Descent (SGD) over heterogeneous data and shows that the implicit bias drives the model learning towards an invariant solution.
Specifically, we theoretically investigate the multi-environment low-rank matrix sensing problem where in each environment, the signal comprises (i) a lower-rank invariant part shared across all environments; and (ii) a significantly varying environment-dependent spurious component.
The key insight is, through simply employing the large step size large-batch SGD sequentially in each environment without any explicit regularization, the oscillation caused by heterogeneity can provably prevent model learning spurious signals.
arXiv Detail & Related papers (2024-03-03T07:38:24Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Causal Discovery in Heterogeneous Environments Under the Sparse
Mechanism Shift Hypothesis [7.895866278697778]
Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data.
In reality, this assumption is almost always violated due to distribution shifts between environments.
We propose the Mechanism Shift Score (MSS), a score-based approach amenable to various empirical estimators.
arXiv Detail & Related papers (2022-06-04T15:39:30Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.