Out-of-distribution Generalization with Causal Invariant Transformations
- URL: http://arxiv.org/abs/2203.11528v3
- Date: Thu, 24 Mar 2022 02:47:43 GMT
- Title: Out-of-distribution Generalization with Causal Invariant Transformations
- Authors: Ruoyu Wang, Mingyang Yi, Zhitang Chen, Shengyu Zhu
- Abstract summary: In this work, we tackle the OOD problem without explicitly recovering the causal feature.
Under the setting of invariant causal mechanism, we theoretically show that if all such transformations are available, then we can learn a minimax optimal model.
Noticing that knowing a complete set of these causal invariant transformations may be impractical, we further show that it suffices to know only a subset of these transformations.
- Score: 17.18953986654873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world applications, it is important and desirable to learn a model
that performs well on out-of-distribution (OOD) data. Recently, causality has
become a powerful tool to tackle the OOD generalization problem, with the idea
resting on the causal mechanism that is invariant across domains of interest.
To leverage the generally unknown causal mechanism, existing works assume a
linear form of causal feature or require sufficiently many and diverse training
domains, which are usually restrictive in practice. In this work, we obviate
these assumptions and tackle the OOD problem without explicitly recovering the
causal feature. Our approach is based on transformations that modify the
non-causal feature but leave the causal part unchanged, which can be either
obtained from prior knowledge or learned from the training data in the
multi-domain scenario. Under the setting of invariant causal mechanism, we
theoretically show that if all such transformations are available, then we can
learn a minimax optimal model across the domains using only single domain data.
Noticing that knowing a complete set of these causal invariant transformations
may be impractical, we further show that it suffices to know only a subset of
these transformations. Based on the theoretical findings, a regularized
training procedure is proposed to improve the OOD generalization capability.
Extensive experimental results on both synthetic and real datasets verify the
effectiveness of the proposed algorithm, even with only a few causal invariant
transformations.
Related papers
- DIGIC: Domain Generalizable Imitation Learning by Causal Discovery [69.13526582209165]
Causality has been combined with machine learning to produce robust representations for domain generalization.
We make a different attempt by leveraging the demonstration data distribution to discover causal features for a domain generalizable policy.
We design a novel framework, called DIGIC, to identify the causal features by finding the direct cause of the expert action from the demonstration data distribution.
arXiv Detail & Related papers (2024-02-29T07:09:01Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Domain Generalization In Robust Invariant Representation [10.132611239890345]
In this paper, we investigate the generalization of invariant representations on out-of-distribution data.
We show that the invariant model learns unstructured latent representations that are robust to distribution shifts.
arXiv Detail & Related papers (2023-04-07T00:58:30Z) - Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly.
The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z) - Transfer learning with affine model transformation [18.13383101189326]
This paper presents a general class of transfer learning regression called affine model transfer.
It is shown that the affine model transfer broadly encompasses various existing methods, including the most common procedure based on neural feature extractors.
arXiv Detail & Related papers (2022-10-18T10:50:24Z) - Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data.
We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG)
Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z) - Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Disentanglement by Nonlinear ICA with General Incompressible-flow
Networks (GIN) [30.74691299906988]
A central question of representation learning asks under which conditions it is possible to reconstruct the true latent variables of an arbitrarily complex generative process.
Recent breakthrough work by Khehem et al. on nonlinear ICA has answered this question for a broad class of conditional generative processes.
We extend this important result in a direction relevant for application to real-world data.
arXiv Detail & Related papers (2020-01-14T16:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.