Probabilistically Masked Language Model Capable of Autoregressive
Generation in Arbitrary Word Order
- URL: http://arxiv.org/abs/2004.11579v1
- Date: Fri, 24 Apr 2020 07:38:19 GMT
- Title: Probabilistically Masked Language Model Capable of Autoregressive
Generation in Arbitrary Word Order
- Authors: Yi Liao, Xin Jiang, Qun Liu
- Abstract summary: Masked language model and autoregressive language model are two types of language models.
We propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM)
We prove that u-PMLM is equivalent to an autoregressive permutated language model.
- Score: 32.71489048856101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked language model and autoregressive language model are two types of
language models. While pretrained masked language models such as BERT overwhelm
the line of natural language understanding (NLU) tasks, autoregressive language
models such as GPT are especially capable in natural language generation (NLG).
In this paper, we propose a probabilistic masking scheme for the masked
language model, which we call probabilistically masked language model (PMLM).
We implement a specific PMLM with a uniform prior distribution on the masking
ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive
permutated language model. One main advantage of the model is that it supports
text generation in arbitrary order with surprisingly good quality, which could
potentially enable new applications over traditional unidirectional generation.
Besides, the pretrained u-PMLM also outperforms BERT on a set of downstream NLU
tasks.
Related papers
- FiLM: Fill-in Language Models for Any-Order Generation [71.42044325886194]
Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order.
During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs.
FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
arXiv Detail & Related papers (2023-10-15T19:37:39Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Deriving Language Models from Masked Language Models [12.628196757545979]
Masked language models (MLM) do not explicitly define a distribution over language.
Recent work has implicitly treated them as such for the purposes of generation and scoring.
arXiv Detail & Related papers (2023-05-24T18:42:45Z) - An Overview on Language Models: Recent Developments and Outlook [32.528770408502396]
Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner.
Pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications.
arXiv Detail & Related papers (2023-03-10T07:55:00Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training [21.934439663979663]
Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
arXiv Detail & Related papers (2022-02-28T09:51:32Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Explicitly Modeling Syntax in Language Models with Incremental Parsing
and a Dynamic Oracle [88.65264818967489]
We propose a new syntax-aware language model: Syntactic Ordered Memory (SOM)
The model explicitly models the structure with an incremental and maintains the conditional probability setting of a standard language model.
Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests.
arXiv Detail & Related papers (2020-10-21T17:39:15Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.