Revisiting the Transferability of Supervised Pretraining: an MLP
Perspective
- URL: http://arxiv.org/abs/2112.00496v1
- Date: Wed, 1 Dec 2021 13:47:30 GMT
- Title: Revisiting the Transferability of Supervised Pretraining: an MLP
Perspective
- Authors: Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi,
Wanli Ouyang
- Abstract summary: Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts.
This paper sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective.
We reveal that the projector is also the key factor to better transferability of unsupervised pretraining methods than supervised pretraining methods.
- Score: 78.51258076624046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pretrain-finetune paradigm is a classical pipeline in visual learning.
Recent progress on unsupervised pretraining methods shows superior transfer
performance to their supervised counterparts. This paper revisits this
phenomenon and sheds new light on understanding the transferability gap between
unsupervised and supervised pretraining from a multilayer perceptron (MLP)
perspective. While previous works focus on the effectiveness of MLP on
unsupervised image classification where pretraining and evaluation are
conducted on the same dataset, we reveal that the MLP projector is also the key
factor to better transferability of unsupervised pretraining methods than
supervised pretraining methods. Based on this observation, we attempt to close
the transferability gap between supervised and unsupervised pretraining by
adding an MLP projector before the classifier in supervised pretraining. Our
analysis indicates that the MLP projector can help retain intra-class variation
of visual features, decrease the feature distribution distance between
pretraining and evaluation datasets, and reduce feature redundancy. Extensive
experiments on public benchmarks demonstrate that the added MLP projector
significantly boosts the transferability of supervised pretraining, \eg
\textbf{+7.2\%} top-1 accuracy on the concept generalization task,
\textbf{+5.8\%} top-1 accuracy for linear evaluation on 12-domain
classification tasks, and \textbf{+0.8\%} AP on COCO object detection task,
making supervised pretraining comparable or even better than unsupervised
pretraining. Codes will be released upon acceptance.
Related papers
- Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers [79.60022233109397]
We present spatial prior attention (SPAN), a framework that takes advantage of consistent spatial and semantic structure in unlabeled image datasets.
SPAN operates by regularizing attention masks from separate transformer heads to follow various priors over semantic regions.
We find that the resulting attention masks are more interpretable than those derived from domain-agnostic pretraining.
arXiv Detail & Related papers (2022-09-07T02:30:36Z) - Self-Supervision Can Be a Good Few-Shot Learner [42.06243069679068]
We propose an effective unsupervised few-shot learning method, learning representations with self-supervision.
Specifically, we maximize the mutual information (MI) of instances and their representations with a low-bias MI estimator.
We show that self-supervised pre-training can outperform supervised pre-training under the appropriate conditions.
arXiv Detail & Related papers (2022-07-19T10:23:40Z) - SLIP: Self-supervision meets Language-Image Pre-training [79.53764315471543]
We study whether self-supervised learning can aid in the use of language supervision for visual representation learning.
We introduce SLIP, a multi-task learning framework for combining self-supervised learning and CLIP pre-training.
We find that SLIP enjoys the best of both worlds: better performance than self-supervision and language supervision.
arXiv Detail & Related papers (2021-12-23T18:07:13Z) - Rethinking supervised pre-training for better downstream transferring [46.09030708111374]
We propose a new supervised pre-training method based on Leave-One-Out K-Nearest-Neighbor, or LOOK.
It relieves the problem of overfitting upstream tasks by only requiring each image to share its class label with most of its k nearest neighbors.
We developed efficient implementation of the proposed method that scales well to large datasets.
arXiv Detail & Related papers (2021-10-12T13:57:38Z) - Improve Unsupervised Pretraining for Few-label Transfer [80.58625921631506]
In this paper, we find this conclusion may not hold when the target dataset has very few labeled samples for finetuning.
We propose a new progressive few-label transfer algorithm for real applications.
arXiv Detail & Related papers (2021-07-26T17:59:56Z) - Supervision Accelerates Pre-training in Contrastive Semi-Supervised
Learning of Visual Representations [12.755943669814236]
We propose a semi-supervised loss, SuNCEt, that aims to distinguish examples of different classes in addition to self-supervised instance-wise pretext tasks.
On ImageNet, we find that SuNCEt can be used to match the semi-supervised learning accuracy of previous contrastive approaches.
Our main insight is that leveraging even a small amount of labeled data during pre-training, and not only during fine-tuning, provides an important signal.
arXiv Detail & Related papers (2020-06-18T18:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.