Invariance Principle Meets Information Bottleneck for
Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2106.06607v1
- Date: Fri, 11 Jun 2021 20:42:27 GMT
- Title: Invariance Principle Meets Information Bottleneck for
Out-of-Distribution Generalization
- Authors: Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Yoshua Bengio, Ioannis
Mitliagkas, Irina Rish
- Abstract summary: We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible.
We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
- Score: 77.24152933825238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The invariance principle from causality is at the heart of notable approaches
such as invariant risk minimization (IRM) that seek to address
out-of-distribution (OOD) generalization failures. Despite the promising
theory, invariance principle-based approaches fail in common classification
tasks, where invariant (causal) features capture all the information about the
label. Are these failures due to the methods failing to capture the invariance?
Or is the invariance principle itself insufficient? To answer these questions,
we revisit the fundamental assumptions in linear regression tasks, where
invariance-based approaches were shown to provably generalize OOD. In contrast
to the linear regression tasks, we show that for linear classification tasks we
need much stronger restrictions on the distribution shifts, or otherwise OOD
generalization is impossible. Furthermore, even with appropriate restrictions
on distribution shifts in place, we show that the invariance principle alone is
insufficient. We prove that a form of the information bottleneck constraint
along with invariance helps address key failures when invariant features
capture all the information about the label and also retains the existing
success when they do not. We propose an approach that incorporates both of
these principles and demonstrate its effectiveness in several experiments.
Related papers
- Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - On genuine invariance learning without weight-tying [6.308539010172309]
We analyze invariance learning in neural networks without weight-tying constraints.
We show that learned invariance is strongly conditioned on the input data, rendering it unreliable if the input distribution shifts.
arXiv Detail & Related papers (2023-08-07T20:41:19Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - A data variation robust learning model based on importance sampling [11.285259001286978]
We propose an importance sampling based data variation robust loss (ISloss) for learning problems which minimizes the worst case of loss under the constraint of distribution deviation.
We show that the proposed method is robust under large distribution deviations.
arXiv Detail & Related papers (2023-02-09T04:50:06Z) - Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning
for Ordinal Regression [32.35098925000738]
We argue that existing ALDL algorithms do not fully exploit the intrinsic properties of ordinal regression.
We propose a novel loss function for fully adaptive label distribution learning, namely unimodal-concentrated loss.
arXiv Detail & Related papers (2022-04-01T09:40:11Z) - Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.