UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training
- URL: http://arxiv.org/abs/2002.12804v1
- Date: Fri, 28 Feb 2020 15:28:49 GMT
- Title: UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training
- Authors: Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu
Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
- Abstract summary: We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
- Score: 152.63467944568094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to pre-train a unified language model for both autoencoding and
partially autoregressive language modeling tasks using a novel training
procedure, referred to as a pseudo-masked language model (PMLM). Given an input
text with masked tokens, we rely on conventional masks to learn inter-relations
between corrupted tokens and context via autoencoding, and pseudo masks to
learn intra-relations between masked spans via partially autoregressive
modeling. With well-designed position embeddings and self-attention masks, the
context encodings are reused to avoid redundant computation. Moreover,
conventional masks used for autoencoding provide global masking information, so
that all the position embeddings are accessible in partially autoregressive
language modeling. In addition, the two tasks pre-train a unified language
model as a bidirectional encoder and a sequence-to-sequence decoder,
respectively. Our experiments show that the unified language models pre-trained
using PMLM achieve new state-of-the-art results on a wide range of natural
language understanding and generation tasks across several widely used
benchmarks.
Related papers
- Language Model Adaptation to Specialized Domains through Selective
Masking based on Genre and Topical Characteristics [4.9639158834745745]
We introduce an innovative masking approach leveraging genre and topicality information to tailor language models to specialized domains.
Our method incorporates a ranking process that prioritizes words based on their significance, subsequently guiding the masking procedure.
Experiments conducted using continual pre-training within the legal domain have underscored the efficacy of our approach on the LegalGLUE benchmark in the English language.
arXiv Detail & Related papers (2024-02-19T10:43:27Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z) - Probabilistically Masked Language Model Capable of Autoregressive
Generation in Arbitrary Word Order [32.71489048856101]
Masked language model and autoregressive language model are two types of language models.
We propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM)
We prove that u-PMLM is equivalent to an autoregressive permutated language model.
arXiv Detail & Related papers (2020-04-24T07:38:19Z) - Semi-Autoregressive Training Improves Mask-Predict Decoding [119.8412758943192]
We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict.
Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.
arXiv Detail & Related papers (2020-01-23T19:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.