Learning Diverse Features in Vision Transformers for Improved
Generalization
- URL: http://arxiv.org/abs/2308.16274v1
- Date: Wed, 30 Aug 2023 19:04:34 GMT
- Title: Learning Diverse Features in Vision Transformers for Improved
Generalization
- Authors: Armand Mihai Nicolicioiu, Andrei Liviu Nicolicioiu, Bogdan Alexe,
Damien Teney
- Abstract summary: We show that vision transformers (ViTs) tend to extract robust and spurious features with distinct attention heads.
As a result of this modularity, their performance under distribution shifts can be significantly improved at test time.
We propose a method to further enhance the diversity and complement of the learned features by encouragingarity of the attention heads' input gradients.
- Score: 15.905065768434403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models often rely only on a small set of features even when
there is a rich set of predictive signals in the training data. This makes
models brittle and sensitive to distribution shifts. In this work, we first
examine vision transformers (ViTs) and find that they tend to extract robust
and spurious features with distinct attention heads. As a result of this
modularity, their performance under distribution shifts can be significantly
improved at test time by pruning heads corresponding to spurious features,
which we demonstrate using an "oracle selection" on validation data. Second, we
propose a method to further enhance the diversity and complementarity of the
learned features by encouraging orthogonality of the attention heads' input
gradients. We observe improved out-of-distribution performance on diagnostic
benchmarks (MNIST-CIFAR, Waterbirds) as a consequence of the enhanced diversity
of features and the pruning of undesirable heads.
Related papers
- CAVE: Classifying Abnormalities in Video Capsule Endoscopy [0.1937002985471497]
In this study, we explore an ensemble-based approach to improve classification accuracy in complex image datasets.
We leverage the unique feature-extraction capabilities of each model to enhance the overall accuracy.
Experimental evaluations demonstrate that the ensemble achieves higher accuracy and robustness across challenging and imbalanced classes.
arXiv Detail & Related papers (2024-10-26T17:25:08Z) - Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification [2.552131151698595]
We proposed a novel self-supervision and supervision combining transformer-based person re-identification framework, namely SSSC-TransReID.
We designed a self-supervised contrastive learning branch, which can enhance the feature representation for person re-identification without negative samples or additional pre-training.
Our proposed model obtains superior Re-ID performance consistently and outperforms the state-of-the-art ReID methods by large margins on the mean average accuracy (mAP) and Rank-1 accuracy.
arXiv Detail & Related papers (2024-10-21T03:17:25Z) - Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look [28.350278251132078]
We propose a unified framework to conduct data augmentation in the feature space, known as feature augmentation.
This strategy is domain-agnostic, which augments similar features to the original ones and thus improves the data diversity.
arXiv Detail & Related papers (2024-10-16T09:25:11Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning [28.673952870674146]
We develop a measurement-pretrain-finetune paradigm for Unsupervised Feature Transformation Learning.
For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective.
For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation.
arXiv Detail & Related papers (2024-05-27T06:50:00Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [95.49699178874683]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - MT-SLVR: Multi-Task Self-Supervised Learning for Transformation
In(Variant) Representations [2.94944680995069]
We propose a multi-task self-supervised framework (MT-SLVR) that learns both variant and invariant features in a parameter-efficient manner.
We evaluate our approach on few-shot classification tasks drawn from a variety of audio domains and demonstrate improved classification performance.
arXiv Detail & Related papers (2023-05-29T09:10:50Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.