Can Subnetwork Structure be the Key to Out-of-Distribution
Generalization?
- URL: http://arxiv.org/abs/2106.02890v1
- Date: Sat, 5 Jun 2021 13:19:27 GMT
- Title: Can Subnetwork Structure be the Key to Out-of-Distribution
Generalization?
- Authors: Dinghuai Zhang, Kartik Ahuja, Yilun Xu, Yisen Wang, Aaron Courville
- Abstract summary: In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting.
We demonstrate that even in biased models (which focus on spurious correlation) there still exist unbiased functionalworks.
- Score: 21.037720934987487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can models with particular structure avoid being biased towards spurious
correlation in out-of-distribution (OOD) generalization? Peters et al. (2016)
provides a positive answer for linear cases. In this paper, we use a functional
modular probing method to analyze deep model structures under OOD setting. We
demonstrate that even in biased models (which focus on spurious correlation)
there still exist unbiased functional subnetworks. Furthermore, we articulate
and demonstrate the functional lottery ticket hypothesis: full network contains
a subnetwork that can achieve better OOD performance. We then propose Modular
Risk Minimization to solve the subnetwork selection problem. Our algorithm
learns the subnetwork structure from a given dataset, and can be combined with
any other OOD regularization methods. Experiments on various OOD generalization
tasks corroborate the effectiveness of our method.
Related papers
- Towards out-of-distribution generalization in large-scale astronomical
surveys: robust networks learn similar representations [3.653721769378018]
We use Centered Kernel Alignment (CKA), a similarity measure metric of neural network representations, to examine the relationship between representation similarity and performance.
We find that when models are robust to a distribution shift, they produce substantially different representations across their layers on OOD data.
We discuss the potential application of similarity representation in guiding model design, training strategy, and mitigating the OOD problem by incorporating CKA as an inductive bias during training.
arXiv Detail & Related papers (2023-11-29T19:00:05Z) - Mitigating Simplicity Bias in Deep Learning for Improved OOD
Generalization and Robustness [5.976013616522926]
We propose a framework that encourages the model to use a more diverse set of features to make predictions.
We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model.
We demonstrate the effectiveness of this framework in various problem settings and real-world applications.
arXiv Detail & Related papers (2023-10-09T21:19:39Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - Exploring Optimal Substructure for Out-of-distribution Generalization
via Feature-targeted Model Pruning [23.938392334438582]
We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
arXiv Detail & Related papers (2022-12-19T13:51:06Z) - A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models [53.87983344862402]
Large-scale language models (PLMs) are inefficient in terms of memory footprint and computation.
PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data.
Recent studies show that sparseworks can be replaced with sparseworks without hurting the performance.
arXiv Detail & Related papers (2022-10-11T07:26:34Z) - Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in
Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features.
We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits.
We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z) - The interplay between ranking and communities in networks [0.0]
We present a generative model based on an interplay between community and hierarchical structures.
It assumes that each node has a preference in the interaction mechanism and nodes with the same preference are more likely to interact.
We demonstrate our method on synthetic and real-world data and compare performance with two standard approaches for community detection and ranking extraction.
arXiv Detail & Related papers (2021-12-23T16:10:28Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Towards a Theoretical Framework of Out-of-Distribution Generalization [28.490842160921805]
Generalization to out-of-distribution (OOD) data, or domain generalization, is one of the central problems in modern machine learning.
In this work, we take the first step towards rigorous and quantitative definitions of what is OOD; and what does it mean by saying an OOD problem is learnable.
arXiv Detail & Related papers (2021-06-08T16:32:23Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties.
We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.