When is invariance useful in an Out-of-Distribution Generalization
problem ?
- URL: http://arxiv.org/abs/2008.01883v4
- Date: Thu, 25 Nov 2021 08:04:44 GMT
- Title: When is invariance useful in an Out-of-Distribution Generalization
problem ?
- Authors: Masanori Koyama and Shoichiro Yamaguchi
- Abstract summary: The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments.
Popular approaches in this field use the hypothesis that such a predictor shall be an textitinvariant predictor that captures the mechanism that remains constant across environments.
This paper presents a new set of theoretical conditions necessary for an invariant predictor to achieve the OOD optimality.
- Score: 19.696505968699206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Out-of-Distribution (OOD) generalization problem is to train a
predictor that generalizes on all environments. Popular approaches in this
field use the hypothesis that such a predictor shall be an \textit{invariant
predictor} that captures the mechanism that remains constant across
environments. While these approaches have been experimentally successful in
various case studies, there is still much room for the theoretical validation
of this hypothesis. This paper presents a new set of theoretical conditions
necessary for an invariant predictor to achieve the OOD optimality. Our theory
not only applies to non-linear cases, but also generalizes the necessary
condition used in \citet{rojas2018invariant}. We also derive Inter Gradient
Alignment algorithm from our theory and demonstrate its competitiveness on
MNIST-derived benchmark datasets as well as on two of the three
\textit{Invariance Unit Tests} proposed by \citet{aubinlinear}.
Related papers
- Generalized Laplace Approximation [23.185126261153236]
We introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors.
We propose the generalized Laplace approximation, which involves a simple adjustment to the Hessian matrix of the regularized loss function.
We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.
arXiv Detail & Related papers (2024-05-22T11:11:42Z) - Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation
of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis.
To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z) - Variational Bayesian Neural Networks via Resolution of Singularities [1.2183405753834562]
We advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs)
We lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective.
We use the SLT-corrected form for singular posterior distributions to inform the design of the variational family itself.
arXiv Detail & Related papers (2023-02-13T00:32:49Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Exponential Tail Local Rademacher Complexity Risk Bounds Without the
Bernstein Condition [30.401770841788718]
The local Rademacher toolbox is one of the most successful general-purpose toolboxes.
Applying the Bernstein theory to problems where optimal performance is only achievable via non-probable settings yields an exponential-tail excess risk.
Our results apply to improper prediction regimes not covered by the toolbox.
arXiv Detail & Related papers (2022-02-23T12:27:53Z) - Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z) - A Theoretical Analysis on Independence-driven Importance Weighting for
Covariate-shift Generalization [44.88645911638269]
independence-driven importance algorithms in stable learning literature have shown empirical effectiveness.
In this paper, we theoretically prove the effectiveness of such algorithms by explaining them as feature selection processes.
We prove that under ideal conditions, independence-driven importance weighting algorithms could identify the variables in this set.
arXiv Detail & Related papers (2021-11-03T17:18:49Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.