Towards Simple and Efficient Task-Adaptive Pre-training for Text
Classification
- URL: http://arxiv.org/abs/2209.12943v1
- Date: Mon, 26 Sep 2022 18:29:12 GMT
- Title: Towards Simple and Efficient Task-Adaptive Pre-training for Text
Classification
- Authors: Arnav Ladkat, Aamir Miyajiwala, Samiksha Jagadale, Rekha Kulkarni,
Raviraj Joshi
- Abstract summary: We study the impact of training only the embedding layer on the model's performance during TAPT and task-specific finetuning.
We show that training only the BERT embedding layer during TAPT is sufficient to adapt to the vocabulary of the target domain and achieve comparable performance.
- Score: 0.7874708385247353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models are pre-trained using large corpora of generic data like book
corpus, common crawl and Wikipedia, which is essential for the model to
understand the linguistic characteristics of the language. New studies suggest
using Domain Adaptive Pre-training (DAPT) and Task-Adaptive Pre-training (TAPT)
as an intermediate step before the final finetuning task. This step helps cover
the target domain vocabulary and improves the model performance on the
downstream task. In this work, we study the impact of training only the
embedding layer on the model's performance during TAPT and task-specific
finetuning. Based on our study, we propose a simple approach to make the
intermediate step of TAPT for BERT-based models more efficient by performing
selective pre-training of BERT layers. We show that training only the BERT
embedding layer during TAPT is sufficient to adapt to the vocabulary of the
target domain and achieve comparable performance. Our approach is
computationally efficient, with 78\% fewer parameters trained during TAPT. The
proposed embedding layer finetuning approach can also be an efficient domain
adaptation technique.
Related papers
- Task-Oriented Pre-Training for Drivable Area Detection [5.57325257338134]
We propose a task-oriented pre-training method that begins with generating redundant segmentation proposals.
We then introduce a Specific Category Enhancement Fine-tuning (SCEF) strategy for fine-tuning the Contrastive Language-Image Pre-training (CLIP) model.
This approach can generate a lot of coarse training data for pre-training models, which are further fine-tuned using manually annotated data.
arXiv Detail & Related papers (2024-09-30T10:25:47Z) - An Efficient Active Learning Pipeline for Legal Text Classification [2.462514989381979]
We propose a pipeline for effectively using active learning with pre-trained language models in the legal domain.
We use knowledge distillation to guide the model's embeddings to a semantically meaningful space.
Our experiments on Contract-NLI, adapted to the classification task, and LEDGAR benchmarks show that our approach outperforms standard AL strategies.
arXiv Detail & Related papers (2022-11-15T13:07:02Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Back-Translated Task Adaptive Pretraining: Improving Accuracy and
Robustness on Text Classification [5.420446976940825]
We propose a back-translated task-adaptive pretraining (BT-TAPT) method that increases the amount of task-specific data for LM re-pretraining.
The experimental results show that the proposed BT-TAPT yields improved classification accuracy on both low- and high-resource data and better robustness to noise than the conventional adaptive pretraining method.
arXiv Detail & Related papers (2021-07-22T06:27:35Z) - A Tailored Pre-Training Model for Task-Oriented Dialog Generation [60.05269529832447]
We propose a Pre-trained Role Alternating Language model (PRAL) for task-oriented conversational systems.
We introduce a task-oriented dialog pretraining dataset by cleaning 13 existing data sets.
The results show that PRAL performs better or on par with state-of-the-art methods.
arXiv Detail & Related papers (2020-04-24T09:25:45Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.