Exploring Optimal Substructure for Out-of-distribution Generalization
via Feature-targeted Model Pruning
- URL: http://arxiv.org/abs/2212.09458v1
- Date: Mon, 19 Dec 2022 13:51:06 GMT
- Title: Exploring Optimal Substructure for Out-of-distribution Generalization
via Feature-targeted Model Pruning
- Authors: Yingchun Wang, Jingcai Guo, Song Guo, Weizhan Zhang, Jie Zhang
- Abstract summary: We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
- Score: 23.938392334438582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies show that even highly biased dense networks contain an
unbiased substructure that can achieve better out-of-distribution (OOD)
generalization than the original model. Existing works usually search the
invariant subnetwork using modular risk minimization (MRM) with out-domain
data. Such a paradigm may bring about two potential weaknesses: 1) Unfairness,
due to the insufficient observation of out-domain data during training; and 2)
Sub-optimal OOD generalization, due to the feature-untargeted model pruning on
the whole data distribution. In this paper, we propose a novel Spurious
Feature-targeted model Pruning framework, dubbed SFP, to automatically explore
invariant substructures without referring to the above weaknesses.
Specifically, SFP identifies in-distribution (ID) features during training
using our theoretically verified task loss, upon which, SFP can perform ID
targeted-model pruning that removes branches with strong dependencies on ID
features. Notably, by attenuating the projections of spurious features into
model space, SFP can push the model learning toward invariant features and pull
that out of environmental features, devising optimal OOD generalization.
Moreover, we also conduct detailed theoretical analysis to provide the
rationality guarantee and a proof framework for OOD structures via model
sparsity, and for the first time, reveal how a highly biased data distribution
affects the model's OOD generalization. Extensive experiments on various OOD
datasets show that SFP can significantly outperform both structure-based and
non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72%
and 23.35%, respectively.
Related papers
- DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification [14.96980804513399]
Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains.
Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process.
We introduce a more realistic graph data generation model using Structural Causal Models (SCMs)
We propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings.
arXiv Detail & Related papers (2024-10-27T00:22:18Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - On the Benefits of Over-parameterization for Out-of-Distribution Generalization [28.961538657831788]
We investigate the performance of a machine learning model in terms of Out-of-Distribution (OOD) loss under benign overfitting conditions.
We show that further increasing the model's parameterization can significantly reduce the OOD loss.
These insights explain the empirical phenomenon of enhanced OOD generalization through model ensembles.
arXiv Detail & Related papers (2024-03-26T11:01:53Z) - SFP: Spurious Feature-targeted Pruning for Out-of-Distribution
Generalization [38.37530720506389]
We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure-based OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
arXiv Detail & Related papers (2023-05-19T11:46:36Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - On the Out-of-distribution Generalization of Probabilistic Image
Modelling [6.908460960191626]
We show that, in the case of image models, the OOD ability is dominated by local features.
This motivates our proposal of a Local Autoregressive model that exclusively models local image features towards improving OOD performance.
arXiv Detail & Related papers (2021-09-04T17:00:37Z) - Generalization Properties of Optimal Transport GANs with Latent
Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance.
Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.