Bi-tuning of Pre-trained Representations
- URL: http://arxiv.org/abs/2011.06182v1
- Date: Thu, 12 Nov 2020 03:32:25 GMT
- Title: Bi-tuning of Pre-trained Representations
- Authors: Jincheng Zhong, Ximei Wang, Zhi Kou, Jianmin Wang, Mingsheng Long
- Abstract summary: Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
- Score: 79.58542780707441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is common within the deep learning community to first pre-train a deep
neural network from a large-scale dataset and then fine-tune the pre-trained
model to a specific downstream task. Recently, both supervised and unsupervised
pre-training approaches to learning representations have achieved remarkable
advances, which exploit the discriminative knowledge of labels and the
intrinsic structure of data, respectively. It follows natural intuition that
both discriminative knowledge and intrinsic structure of the downstream task
can be useful for fine-tuning, however, existing fine-tuning methods mainly
leverage the former and discard the latter. A question arises: How to fully
explore the intrinsic structure of data for boosting fine-tuning? In this
paper, we propose Bi-tuning, a general learning framework to fine-tuning both
supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the
backbone of pre-trained representations: a classifier head with an improved
contrastive cross-entropy loss to better leverage the label information in an
instance-contrast way, and a projector head with a newly-designed categorical
contrastive learning loss to fully exploit the intrinsic structure of data in a
category-consistent way. Comprehensive experiments confirm that Bi-tuning
achieves state-of-the-art results for fine-tuning tasks of both supervised and
unsupervised pre-trained models by large margins (e.g. 10.7\% absolute rise in
accuracy on CUB in low-data regime).
Related papers
- HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks.
Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions.
We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z) - Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net
Estimation and Optimization [58.90989478049686]
Bi-Drop is a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets.
Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods.
arXiv Detail & Related papers (2023-05-24T06:09:26Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Overwriting Pretrained Bias with Finetuning Data [36.050345384273655]
We investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset.
We find that models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset.
Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.
arXiv Detail & Related papers (2023-03-10T19:10:58Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.