Task-adaptive Pre-training and Self-training are Complementary for
Natural Language Understanding
- URL: http://arxiv.org/abs/2109.06466v1
- Date: Tue, 14 Sep 2021 06:24:28 GMT
- Title: Task-adaptive Pre-training and Self-training are Complementary for
Natural Language Understanding
- Authors: Shiyang Li, Semih Yavuz, Wenhu Chen, Xifeng Yan
- Abstract summary: Task-supervised pre-training (TAPT) and Self-training (ST) have emerged as the major semi-adaptive approaches to improve natural language understanding.
We show that TAPT and ST can be complementary with simple protocol by following TAPT Fine -> Self-training (TFS) process.
- Score: 27.459759446031192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as the
major semi-supervised approaches to improve natural language understanding
(NLU) tasks with massive amount of unlabeled data. However, it's unclear
whether they learn similar representations or they can be effectively combined.
In this paper, we show that TAPT and ST can be complementary with simple TFS
protocol by following TAPT -> Finetuning -> Self-training (TFS) process.
Experimental results show that TFS protocol can effectively utilize unlabeled
data to achieve strong combined gains consistently across six datasets covering
sentiment classification, paraphrase identification, natural language
inference, named entity recognition and dialogue slot classification. We
investigate various semi-supervised settings and consistently show that gains
from TAPT and ST can be strongly additive by following TFS procedure. We hope
that TFS could serve as an important semi-supervised baseline for future NLP
studies.
Related papers
- From Question to Exploration: Test-Time Adaptation in Semantic Segmentation? [21.27237423511349]
Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to test data with potential distribution shifts.
We investigate the applicability of existing classic TTA strategies in semantic segmentation.
arXiv Detail & Related papers (2023-10-09T01:59:49Z) - Rethinking Semi-supervised Learning with Language Models [33.70349754359132]
Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance.
There are two popular approaches to make use of unlabelled data: Self-training (ST) and Task-adaptive pre-training (TAPT)
arXiv Detail & Related papers (2023-05-22T13:07:35Z) - When do you need Chain-of-Thought Prompting for ChatGPT? [87.45382888430643]
Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models(LLMs)
It is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT.
arXiv Detail & Related papers (2023-04-06T17:47:29Z) - Revisiting Realistic Test-Time Training: Sequential Inference and
Adaptation by Anchored Clustering Regularized Self-Training [37.75537703971045]
We develop a test-time anchored clustering (TTAC) approach to enable stronger test-time feature learning.
Self-training(ST) has demonstrated great success in learning from unlabeled data.
TTAC++ consistently outperforms the state-of-the-art methods on five TTT datasets.
arXiv Detail & Related papers (2023-03-20T04:30:18Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - Towards Simple and Efficient Task-Adaptive Pre-training for Text
Classification [0.7874708385247353]
We study the impact of training only the embedding layer on the model's performance during TAPT and task-specific finetuning.
We show that training only the BERT embedding layer during TAPT is sufficient to adapt to the vocabulary of the target domain and achieve comparable performance.
arXiv Detail & Related papers (2022-09-26T18:29:12Z) - Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask
Training [55.43088293183165]
Recent studies show that pre-trained language models (PLMs) like BERT contain matchingworks that have similar transfer learning performance as the original PLM.
In this paper, we find that the BERTworks have even more potential than these studies have shown.
We train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork.
arXiv Detail & Related papers (2022-04-24T08:42:47Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.