BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization
- URL: http://arxiv.org/abs/2410.02387v4
- Date: Wed, 21 May 2025 13:32:08 GMT
- Title: BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization
- Authors: Gustav Wagner Zakarias, Lars Kai Hansen, Zheng-Hua Tan,
- Abstract summary: BiSSL is a novel bilevel training framework that enhances the alignment of self-supervised pretrained models with downstream tasks prior to fine-tuning.<n>We propose a general training algorithm for BiSSL that is compatible with a broad range of pretext and downstream tasks.<n>Our proposed framework significantly improves accuracy on the vast majority of 12 downstream image classification datasets, as well as on object detection.
- Score: 12.749627564482282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models initialized from self-supervised pretraining may suffer from poor alignment with downstream tasks, reducing the extent to which subsequent fine-tuning can adapt pretrained features toward downstream objectives. To mitigate this, we introduce BiSSL, a novel bilevel training framework that enhances the alignment of self-supervised pretrained models with downstream tasks prior to fine-tuning. BiSSL acts as an intermediate training stage conducted after conventional self-supervised pretraining and is tasked with solving a bilevel optimization problem that incorporates the pretext and downstream training objectives in its lower- and upper-level objectives, respectively. This approach explicitly models the interdependence between the pretraining and fine-tuning stages within the conventional self-supervised learning pipeline, facilitating enhanced information sharing between them that ultimately leads to a model initialization better aligned with the downstream task. We propose a general training algorithm for BiSSL that is compatible with a broad range of pretext and downstream tasks. Using SimCLR and Bootstrap Your Own Latent to pretrain ResNet-50 backbones on the ImageNet dataset, we demonstrate that our proposed framework significantly improves accuracy on the vast majority of 12 downstream image classification datasets, as well as on object detection. Exploratory analyses alongside investigative experiments further provide compelling evidence that BiSSL enhances downstream alignment.
Related papers
- Unlabeled Data or Pre-trained Model: Rethinking Semi-Supervised Learning and Pretrain-Finetuning [47.18766077898836]
Semi-supervised learning (SSL) alleviates the cost of data labeling process by exploiting unlabeled data.<n>Pretrain-Finetuning paradigm has garnered significant attention in recent years.<n>We propose textitFew-shot SSL -- a framework that enables fair comparison between these two paradigms.
arXiv Detail & Related papers (2025-05-19T16:29:20Z) - Uni-Sign: Toward Unified Sign Language Understanding at Scale [90.76641997060513]
We propose a unified pre-training framework that eliminates the gap between pre-training and downstream SLU tasks.
Uni-Sign achieves state-of-the-art performance across multiple downstream SLU tasks.
arXiv Detail & Related papers (2025-01-25T11:51:23Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Self-supervised visual learning in the low-data regime: a comparative evaluation [38.34785825702943]
Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs)<n>It allows efficient representation learning from massive amounts of unlabeled training data.<n>It is not always feasible to collect and/or to utilize very large pretraining datasets.
arXiv Detail & Related papers (2024-04-26T07:23:14Z) - BECLR: Batch Enhanced Contrastive Few-Shot Learning [1.450405446885067]
Unsupervised few-shot learning aspires to bridge this gap by discarding the reliance on annotations at training time.
We propose a novel Dynamic Clustered mEmory (DyCE) module to promote a highly separable latent representation space.
We then tackle the, somehow overlooked yet critical, issue of sample bias at the few-shot inference stage.
arXiv Detail & Related papers (2024-02-04T10:52:43Z) - Joint Unsupervised and Supervised Training for Automatic Speech
Recognition via Bilevel Optimization [73.98386682604122]
We present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term bi-level joint unsupervised and supervised training (BL-JUST).
BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.
arXiv Detail & Related papers (2024-01-13T05:01:47Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - FUSSL: Fuzzy Uncertain Self Supervised Learning [8.31483061185317]
Self supervised learning (SSL) has become a very successful technique to harness the power of unlabeled data, with no annotation effort.
In this paper, for the first time, we recognize the fundamental limits of SSL coming from the use of a single-supervisory signal.
We propose a robust and general standard hierarchical learning/training protocol for any SSL baseline.
arXiv Detail & Related papers (2022-10-28T01:06:10Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Improving Self-Supervised Learning by Characterizing Idealized
Representations [155.1457170539049]
We prove necessary and sufficient conditions for any task invariant to given data augmentations.
For contrastive learning, our framework prescribes simple but significant improvements to previous methods.
For non-contrastive learning, we use our framework to derive a simple and novel objective.
arXiv Detail & Related papers (2022-09-13T18:01:03Z) - On the Importance of Hyperparameters and Data Augmentation for
Self-Supervised Learning [32.53142486214591]
Self-Supervised Learning (SSL) has become a very active area of Deep Learning research where it is heavily used as a pre-training method for classification and other tasks.
Here, we show that, indeed, the choice of hyper parameters and data augmentation strategies can have a dramatic impact on performance.
We introduce a new automated data augmentation algorithm, GroupAugment, which considers groups of augmentations and optimize the sampling across groups.
arXiv Detail & Related papers (2022-07-16T08:31:11Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.