Winning Prize Comes from Losing Tickets: Improve Invariant Learning by
Exploring Variant Parameters for Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2310.16391v1
- Date: Wed, 25 Oct 2023 06:10:57 GMT
- Title: Winning Prize Comes from Losing Tickets: Improve Invariant Learning by
Exploring Variant Parameters for Out-of-Distribution Generalization
- Authors: Zhuo Huang, Muyang Li, Li Shen, Jun Yu, Chen Gong, Bo Han, Tongliang
Liu
- Abstract summary: Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features.
Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task.
We propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift.
- Score: 76.27711056914168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-Distribution (OOD) Generalization aims to learn robust models that
generalize well to various environments without fitting to
distribution-specific features. Recent studies based on Lottery Ticket
Hypothesis (LTH) address this problem by minimizing the learning target to find
some of the parameters that are critical to the task. However, in OOD problems,
such solutions are suboptimal as the learning task contains severe distribution
noises, which can mislead the optimization process. Therefore, apart from
finding the task-related parameters (i.e., invariant parameters), we propose
Exploring Variant parameters for Invariant Learning (EVIL) which also leverages
the distribution knowledge to find the parameters that are sensitive to
distribution shift (i.e., variant parameters). Once the variant parameters are
left out of invariant learning, a robust subnetwork that is resistant to
distribution shift can be found. Additionally, the parameters that are
relatively stable across distributions can be considered invariant ones to
improve invariant learning. By fully exploring both variant and invariant
parameters, our EVIL can effectively identify a robust subnetwork to improve
OOD generalization. In extensive experiments on integrated testbed: DomainBed,
EVIL can effectively and efficiently enhance many popular methods, such as ERM,
IRM, SAM, etc.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Probabilistic Invariant Learning with Randomized Linear Classifiers [24.485477981244593]
We show how to leverage randomness and design models that are both expressive and invariant but use less resources.
Inspired by randomized algorithms, we propose a class of binary classification models called Randomized Linears (RLCs)
arXiv Detail & Related papers (2023-08-08T17:18:04Z) - Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization [10.009748368458409]
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity.
Our method enables fully differentiable approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
arXiv Detail & Related papers (2023-07-07T13:06:12Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - Learning Optimal Features via Partial Invariance [18.552839725370383]
Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments.
We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$.
Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
arXiv Detail & Related papers (2023-01-28T02:48:14Z) - Sufficient Invariant Learning for Distribution Shift [16.838595294610105]
We argue that learning sufficient invariant features from the training set is crucial for the distribution shift case.
ASGDRO learns sufficient invariant features by seeking common flat minima across all groups or domains.
arXiv Detail & Related papers (2022-10-24T18:34:24Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.