Nonlinear Invariant Risk Minimization: A Causal Approach
- URL: http://arxiv.org/abs/2102.12353v1
- Date: Wed, 24 Feb 2021 15:38:41 GMT
- Title: Nonlinear Invariant Risk Minimization: A Causal Approach
- Authors: Chaochao Lu, Yuhuai Wu, Jo\'se Miguel Hern\'andez-Lobato, Bernhard
Sch\"olkopf
- Abstract summary: We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
- Score: 5.63479133344366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to spurious correlations, machine learning systems often fail to
generalize to environments whose distributions differ from the ones used at
training time. Prior work addressing this, either explicitly or implicitly,
attempted to find a data representation that has an invariant causal
relationship with the target. This is done by leveraging a diverse set of
training environments to reduce the effect of spurious features and build an
invariant predictor. However, these methods have generalization guarantees only
when both data representation and classifiers come from a linear model class.
We propose Invariant Causal Representation Learning (ICRL), a learning paradigm
that enables out-of-distribution (OOD) generalization in the nonlinear setting
(i.e., nonlinear representations and nonlinear classifiers). It builds upon a
practical and general assumption: the prior over the data representation
factorizes when conditioning on the target and the environment. Based on this,
we show identifiability of the data representation up to very simple
transformations. We also prove that all direct causes of the target can be
fully discovered, which further enables us to obtain generalization guarantees
in the nonlinear setting. Extensive experiments on both synthetic and
real-world datasets show that our approach significantly outperforms a variety
of baseline methods. Finally, in the concluding discussion, we further explore
the aforementioned assumption and propose a general view, called the Agnostic
Hypothesis: there exist a set of hidden causal factors affecting both inputs
and outcomes. The Agnostic Hypothesis can provide a unifying view of machine
learning in terms of representation learning. More importantly, it can inspire
a new direction to explore the general theory for identifying hidden causal
factors, which is key to enabling the OOD generalization guarantees in machine
learning.
Related papers
- Demystifying amortized causal discovery with transformers [21.058343547918053]
Supervised learning approaches for causal discovery from observational data often achieve competitive performance.
In this work, we investigate CSIvA, a transformer-based model promising to train on synthetic data and transfer to real data.
We bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations.
arXiv Detail & Related papers (2024-05-27T08:17:49Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Effect-Invariant Mechanisms for Policy Generalization [3.701112941066256]
It has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments.
We introduce a relaxation of full invariance called effect-invariance and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization.
We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-19T14:50:24Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - Causal Discovery in Heterogeneous Environments Under the Sparse
Mechanism Shift Hypothesis [7.895866278697778]
Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data.
In reality, this assumption is almost always violated due to distribution shifts between environments.
We propose the Mechanism Shift Score (MSS), a score-based approach amenable to various empirical estimators.
arXiv Detail & Related papers (2022-06-04T15:39:30Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.