Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining
Paradigms
- URL: http://arxiv.org/abs/2007.04234v1
- Date: Fri, 19 Jun 2020 05:21:00 GMT
- Title: Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining
Paradigms
- Authors: Xingyi Yang, Xuehai He, Yuxiao Liang, Yue Yang, Shanghang Zhang,
Pengtao Xie
- Abstract summary: Self-supervised learning (SSL) has demonstrated promising results on a wide range of applications.
There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.
- Score: 36.04356511882304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretraining has become a standard technique in computer vision and natural
language processing, which usually helps to improve performance substantially.
Previously, the most dominant pretraining method is transfer learning (TL),
which uses labeled data to learn a good representation network. Recently, a new
pretraining approach -- self-supervised learning (SSL) -- has demonstrated
promising results on a wide range of applications. SSL does not require
annotated labels. It is purely conducted on input data by solving auxiliary
tasks defined on the input data examples. The current reported results show
that in certain applications, SSL outperforms TL and the other way around in
other applications. There has not been a clear understanding on what properties
of data and tasks render one approach outperforms the other. Without an
informed guideline, ML researchers have to try both methods to find out which
one is better empirically. It is usually time-consuming to do so. In this work,
we aim to address this problem. We perform a comprehensive comparative study
between SSL and TL regarding which one works better under different properties
of data and tasks, including domain difference between source and target tasks,
the amount of pretraining data, class imbalance in source data, and usage of
target data for additional pretraining, etc. The insights distilled from our
comparative studies can help ML researchers decide which method to use based on
the properties of their applications.
Related papers
- A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Self-supervised visual learning in the low-data regime: a comparative evaluation [40.27083924454058]
Self-Supervised Learning (SSL) is a robust training methodology for contemporary Deep Neural Networks (DNNs)
This work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches.
For domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining.
arXiv Detail & Related papers (2024-04-26T07:23:14Z) - On Pretraining Data Diversity for Self-Supervised Learning [57.91495006862553]
We explore the impact of training with more diverse datasets on the performance of self-supervised learning (SSL) under a fixed computational budget.
Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal.
arXiv Detail & Related papers (2024-03-20T17:59:58Z) - LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently.
We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction.
With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z) - Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training [20.98770732015944]
Few-shot intent detection involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data.
We show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected.
To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance.
arXiv Detail & Related papers (2023-06-08T15:26:52Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.