Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation
- URL: http://arxiv.org/abs/2010.02705v1
- Date: Tue, 6 Oct 2020 13:27:01 GMT
- Title: Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation
- Authors: Minki Kang, Moonsu Han, Sung Ju Hwang
- Abstract summary: We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
- Score: 63.195935452646815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method to automatically generate a domain- and task-adaptive
maskings of the given text for self-supervised pre-training, such that we can
effectively adapt the language model to a particular target task (e.g. question
answering). Specifically, we present a novel reinforcement learning-based
framework which learns the masking policy, such that using the generated masks
for further pre-training of the target language model helps improve task
performance on unseen texts. We use off-policy actor-critic with entropy
regularization and experience replay for reinforcement learning, and propose a
Transformer-based policy network that can consider the relative importance of
words in a given text. We validate our Neural Mask Generator (NMG) on several
question answering and text classification datasets using BERT and DistilBERT
as the language models, on which it outperforms rule-based masking strategies,
by automatically learning optimal adaptive maskings.
Related papers
- Language Model Adaptation to Specialized Domains through Selective
Masking based on Genre and Topical Characteristics [4.9639158834745745]
We introduce an innovative masking approach leveraging genre and topicality information to tailor language models to specialized domains.
Our method incorporates a ranking process that prioritizes words based on their significance, subsequently guiding the masking procedure.
Experiments conducted using continual pre-training within the legal domain have underscored the efficacy of our approach on the LegalGLUE benchmark in the English language.
arXiv Detail & Related papers (2024-02-19T10:43:27Z) - Investigating Masking-based Data Generation in Language Models [0.0]
A feature of BERT and models with similar architecture is the objective of masked language modeling.
Data augmentation is a data-driven technique widely used in machine learning.
Recent studies have utilized masked language model to generate artificially augmented data for NLP downstream tasks.
arXiv Detail & Related papers (2023-06-16T16:48:27Z) - Self-Evolution Learning for Discriminative Language Model Pretraining [103.57103957631067]
Self-Evolution learning (SE) is a simple and effective token masking and learning method.
SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
arXiv Detail & Related papers (2023-05-24T16:00:54Z) - InforMask: Unsupervised Informative Masking for Language Model
Pretraining [13.177839395411858]
We propose a new unsupervised masking strategy for training masked language models.
InforMask exploits Pointwise Mutual Information (PMI) to select the most informative tokens to mask.
arXiv Detail & Related papers (2022-10-21T07:10:56Z) - Effective Unsupervised Domain Adaptation with Adversarially Trained
Language Models [54.569004548170824]
We show that careful masking strategies can bridge the knowledge gap of masked language models.
We propose an effective training strategy by adversarially masking out those tokens which are harder to adversarial by the underlying.
arXiv Detail & Related papers (2020-10-05T01:49:47Z) - Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models [49.64561153284428]
We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.
In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
arXiv Detail & Related papers (2020-04-26T15:03:47Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.