SFP: Spurious Feature-targeted Pruning for Out-of-Distribution
Generalization
- URL: http://arxiv.org/abs/2305.11615v2
- Date: Fri, 2 Jun 2023 04:29:43 GMT
- Title: SFP: Spurious Feature-targeted Pruning for Out-of-Distribution
Generalization
- Authors: Yingchun Wang, Jingcai Guo, Yi Liu, Song Guo, Weizhan Zhang, Xiangyong
Cao, Qinghua Zheng
- Abstract summary: We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure-based OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
- Score: 38.37530720506389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model substructure learning aims to find an invariant network substructure
that can have better out-of-distribution (OOD) generalization than the original
full structure. Existing works usually search the invariant substructure using
modular risk minimization (MRM) with fully exposed out-domain data, which may
bring about two drawbacks: 1) Unfairness, due to the dependence of the full
exposure of out-domain data; and 2) Sub-optimal OOD generalization, due to the
equally feature-untargeted pruning on the whole data distribution. Based on the
idea that in-distribution (ID) data with spurious features may have a lower
experience risk, in this paper, we propose a novel Spurious Feature-targeted
model Pruning framework, dubbed SFP, to automatically explore invariant
substructures without referring to the above drawbacks. Specifically, SFP
identifies spurious features within ID instances during training using our
theoretically verified task loss, upon which, SFP attenuates the corresponding
feature projections in model space to achieve the so-called spurious
feature-targeted pruning. This is typically done by removing network branches
with strong dependencies on identified spurious features, thus SFP can push the
model learning toward invariant features and pull that out of spurious features
and devise optimal OOD generalization. Moreover, we also conduct detailed
theoretical analysis to provide the rationality guarantee and a proof framework
for OOD structures via model sparsity, and for the first time, reveal how a
highly biased data distribution affects the model's OOD generalization.
Experiments on various OOD datasets show that SFP can significantly outperform
both structure-based and non-structure-based OOD generalization SOTAs, with
accuracy improvement up to 4.72% and 23.35%, respectively
Related papers
- On the Benefits of Over-parameterization for Out-of-Distribution Generalization [28.961538657831788]
We investigate the performance of a machine learning model in terms of Out-of-Distribution (OOD) loss under benign overfitting conditions.
We show that further increasing the model's parameterization can significantly reduce the OOD loss.
These insights explain the empirical phenomenon of enhanced OOD generalization through model ensembles.
arXiv Detail & Related papers (2024-03-26T11:01:53Z) - Exploring Optimal Substructure for Out-of-distribution Generalization
via Feature-targeted Model Pruning [23.938392334438582]
We propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures.
SFP can significantly outperform both structure-based and non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.
arXiv Detail & Related papers (2022-12-19T13:51:06Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - Disentangled Federated Learning for Tackling Attributes Skew via
Invariant Aggregation and Diversity Transferring [104.19414150171472]
Attributes skews the current federated learning (FL) frameworks from consistent optimization directions among the clients.
We propose disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches.
Experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods.
arXiv Detail & Related papers (2022-06-14T13:12:12Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Explaining a Series of Models by Propagating Local Feature Attributions [9.66840768820136]
Pipelines involving several machine learning models improve performance in many domains but are difficult to understand.
We introduce a framework to propagate local feature attributions through complex pipelines of models based on a connection to the Shapley value.
Our framework enables us to draw higher-level conclusions based on groups of gene expression features for Alzheimer's and breast cancer histologic grade prediction.
arXiv Detail & Related papers (2021-04-30T22:20:58Z) - FSPN: A New Class of Probabilistic Graphical Model [37.80683263600885]
We introduce factorize sum split product networks (FSPNs), a new class of probabilistic graphical models (PGMs)
FSPNs are designed to overcome the drawbacks of existing PGMs in terms of estimation accuracy and inference efficiency.
We present efficient probability inference and structure learning algorithms for FSPNs, along with a theoretical analysis and extensive evaluation evidence.
arXiv Detail & Related papers (2020-11-18T01:11:55Z) - Posterior Differential Regularization with f-divergence for Improving
Model Robustness [95.05725916287376]
We focus on methods that regularize the model posterior difference between clean and noisy inputs.
We generalize the posterior differential regularization to the family of $f$-divergences.
Our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness.
arXiv Detail & Related papers (2020-10-23T19:58:01Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.