Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream
Data? A Theoretical Analysis
- URL: http://arxiv.org/abs/2103.03568v1
- Date: Fri, 5 Mar 2021 09:53:10 GMT
- Title: Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream
Data? A Theoretical Analysis
- Authors: Jiaye Teng, Weiran Huang
- Abstract summary: Pretext-based self-supervised learning aims to learn the semantic representation via a handcrafted pretext task over unlabeled data.
citetlee 2020predicting prove that pretext-based self-supervised learning can effectively reduce the sample complexity of downstream tasks under Conditional Independence (CI)
We explore the idea of applying a learnable function to the input to make the CI condition hold.
- Score: 12.188482172898656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretext-based self-supervised learning aims to learn the semantic
representation via a handcrafted pretext task over unlabeled data and then use
the learned representation for downstream prediction tasks.
\citet{lee2020predicting} prove that pretext-based self-supervised learning can
effectively reduce the sample complexity of downstream tasks under Conditional
Independence (CI) between the components of the pretext task conditional on the
downstream label. However, the CI condition rarely holds in practice, and the
downstream sample complexity will get much worse if the CI condition does not
hold. In this paper, we explore the idea of applying a learnable function to
the input to make the CI condition hold. In particular, we first rigorously
formulate the criteria that the function needs to satisfy. We then design an
ingenious loss function for learning such a function and prove that the
function minimizing the proposed loss satisfies the above criteria. We
theoretically study the number of labeled data required, and give a model-free
lower bound showing that taking limited downstream data will hurt the
performance of self-supervised learning. Furthermore, we take the model
structure into account and give a model-dependent lower bound, which gets
higher when the model capacity gets larger. Moreover, we conduct several
numerical experiments to verify our theoretical results.
Related papers
- Unsupervised Transfer Learning via Adversarial Contrastive Training [3.227277661633986]
We propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT)
Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets.
arXiv Detail & Related papers (2024-08-16T05:11:52Z) - Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
Pre-training has achieved remarkable success when transferred to downstream tasks.
This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks.
arXiv Detail & Related papers (2023-06-21T07:43:23Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Task-Agnostic Robust Representation Learning [31.818269301504564]
We study the problem of robust representation learning with unlabeled data in a task-agnostic manner.
We derive an upper bound on the adversarial loss of a prediction model on any downstream task, using its loss on the clean data and a robustness regularizer.
Our method achieves preferable adversarial performance compared to relevant baselines.
arXiv Detail & Related papers (2022-03-15T02:05:11Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Identifying Wrongly Predicted Samples: A Method for Active Learning [6.976600214375139]
We propose a simple sample selection criterion that moves beyond uncertainty.
We show state-of-the-art results and better rates at identifying wrongly predicted samples.
arXiv Detail & Related papers (2020-10-14T09:00:42Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.