Using Selective Masking as a Bridge between Pre-training and Fine-tuning
- URL: http://arxiv.org/abs/2211.13815v1
- Date: Thu, 24 Nov 2022 22:25:27 GMT
- Title: Using Selective Masking as a Bridge between Pre-training and Fine-tuning
- Authors: Tanish Lad, Himanshu Maheshwari, Shreyas Kottukkal, Radhika Mamidi
- Abstract summary: We propose a way to tailor a pre-trained BERT model for the downstream task via task-specific masking.
We show that the selective masking strategy outperforms random masking, indicating its effectiveness.
- Score: 5.677685109155077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-training a language model and then fine-tuning it for downstream tasks
has demonstrated state-of-the-art results for various NLP tasks. Pre-training
is usually independent of the downstream task, and previous works have shown
that this pre-training alone might not be sufficient to capture the
task-specific nuances. We propose a way to tailor a pre-trained BERT model for
the downstream task via task-specific masking before the standard supervised
fine-tuning. For this, a word list is first collected specific to the task. For
example, if the task is sentiment classification, we collect a small sample of
words representing both positive and negative sentiments. Next, a word's
importance for the task, called the word's task score, is measured using the
word list. Each word is then assigned a probability of masking based on its
task score. We experiment with different masking functions that assign the
probability of masking based on the word's task score. The BERT model is
further trained on MLM objective, where masking is done using the above
strategy. Following this standard supervised fine-tuning is done for different
downstream tasks. Results on these tasks show that the selective masking
strategy outperforms random masking, indicating its effectiveness.
Related papers
- Difference-Masking: Choosing What to Mask in Continued Pretraining [56.76782116221438]
We introduce Difference-Masking, a masking strategy that automatically chooses what to mask during continued pretraining.
We find that Difference-Masking outperforms baselines on continued pretraining settings across four diverse language-only and multimodal video tasks.
arXiv Detail & Related papers (2023-05-23T23:31:02Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - Zero-Shot Information Extraction as a Unified Text-to-Triple Translation [56.01830747416606]
We cast a suite of information extraction tasks into a text-to-triple translation framework.
We formalize the task as a translation between task-specific input text and output triples.
We study the zero-shot performance of this framework on open information extraction.
arXiv Detail & Related papers (2021-09-23T06:54:19Z) - Frustratingly Simple Pretraining Alternatives to Masked Language
Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations.
In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Hierarchical Multitask Learning Approach for BERT [0.36525095710982913]
BERT learns embeddings by solving two tasks, which are masked language model (masked LM) and the next sentence prediction (NSP)
We adopt hierarchical multitask learning approaches for BERT pre-training.
Our results show that imposing a task hierarchy in pre-training improves the performance of embeddings.
arXiv Detail & Related papers (2020-10-17T09:23:04Z) - Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks [40.97125791174191]
We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.
We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
arXiv Detail & Related papers (2020-09-17T17:53:59Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.