Elastic Weight Consolidation Improves the Robustness of Self-Supervised
Learning Methods under Transfer
- URL: http://arxiv.org/abs/2210.16365v1
- Date: Fri, 28 Oct 2022 19:00:25 GMT
- Title: Elastic Weight Consolidation Improves the Robustness of Self-Supervised
Learning Methods under Transfer
- Authors: Andrius Ovsianas, Jason Ramapuram, Dan Busbridge, Eeshan Gunesh
Dhekane, Russ Webb
- Abstract summary: Self-supervised representation learning (SSL) methods provide an effective label-free initial condition for fine-tuning downstream tasks.
We re-interpret SSL fine-tuning under the lens of Bayesian continual learning and consider regularization through the Elastic Weight Consolidation (EWC) framework.
We demonstrate that self-regularization against an initial SSL backbone improves worst sub-group performance in Waterbirds by 5% and Celeb-A by 2%.
- Score: 4.2141621237414615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised representation learning (SSL) methods provide an effective
label-free initial condition for fine-tuning downstream tasks. However, in
numerous realistic scenarios, the downstream task might be biased with respect
to the target label distribution. This in turn moves the learned fine-tuned
model posterior away from the initial (label) bias-free self-supervised model
posterior. In this work, we re-interpret SSL fine-tuning under the lens of
Bayesian continual learning and consider regularization through the Elastic
Weight Consolidation (EWC) framework. We demonstrate that self-regularization
against an initial SSL backbone improves worst sub-group performance in
Waterbirds by 5% and Celeb-A by 2% when using the ViT-B/16 architecture.
Furthermore, to help simplify the use of EWC with SSL, we pre-compute and
publicly release the Fisher Information Matrix (FIM), evaluated with 10,000
ImageNet-1K variates evaluated on large modern SSL architectures including
ViT-B/16 and ResNet50 trained with DINO.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Stable Distillation: Regularizing Continued Pre-training for
Low-Resource Automatic Speech Recognition [54.9235160379917]
Stable Distillation is a simple and novel approach for SSL-based continued pre-training.
It boosts ASR performance in the target domain where both labeled and unlabeled data are limited.
arXiv Detail & Related papers (2023-12-20T06:02:12Z) - Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label
Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction.
Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data.
Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Efficient Gaussian Process Model on Class-Imbalanced Datasets for
Generalized Zero-Shot Learning [37.00463358780726]
We propose a Neural Network model that learns a latent feature embedding and a Gaussian Process (GP) regression model that predicts latent feature prototypes of unseen classes.
Our model is trained efficiently with a simple training strategy that mitigates the impact of class-imbalanced training data.
arXiv Detail & Related papers (2022-10-11T04:57:20Z) - Improving Self-Supervised Learning by Characterizing Idealized
Representations [155.1457170539049]
We prove necessary and sufficient conditions for any task invariant to given data augmentations.
For contrastive learning, our framework prescribes simple but significant improvements to previous methods.
For non-contrastive learning, we use our framework to derive a simple and novel objective.
arXiv Detail & Related papers (2022-09-13T18:01:03Z) - Revisiting Pretraining for Semi-Supervised Learning in the Low-Label
Regime [15.863530936691157]
Semi-supervised learning (SSL) addresses the lack of labeled data by exploiting large unlabeled data through pseudolabeling.
Recent studies combined finetuning (FT) from pretrained weights with SSL to mitigate the challenges and claimed superior results in the low-label regime.
arXiv Detail & Related papers (2022-05-06T03:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.