Deriving Language Models from Masked Language Models
- URL: http://arxiv.org/abs/2305.15501v1
- Date: Wed, 24 May 2023 18:42:45 GMT
- Title: Deriving Language Models from Masked Language Models
- Authors: Lucas Torroba Hennigen, Yoon Kim
- Abstract summary: Masked language models (MLM) do not explicitly define a distribution over language.
Recent work has implicitly treated them as such for the purposes of generation and scoring.
- Score: 12.628196757545979
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Masked language models (MLM) do not explicitly define a distribution over
language, i.e., they are not language models per se. However, recent work has
implicitly treated them as such for the purposes of generation and scoring.
This paper studies methods for deriving explicit joint distributions from MLMs,
focusing on distributions over two tokens, which makes it possible to calculate
exact distributional properties. We find that an approach based on identifying
joints whose conditionals are closest to those of the MLM works well and
outperforms existing Markov random field-based approaches. We further find that
this derived model's conditionals can even occasionally outperform the original
MLM's conditionals.
Related papers
- DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Which Syntactic Capabilities Are Statistically Learned by Masked
Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities.
To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z) - FiLM: Fill-in Language Models for Any-Order Generation [71.42044325886194]
Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order.
During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs.
FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
arXiv Detail & Related papers (2023-10-15T19:37:39Z) - Inconsistencies in Masked Language Models [20.320583166619528]
Masked language models (MLMs) can provide distributions of tokens in the masked positions in a sequence.
distributions corresponding to different masking patterns can demonstrate considerable inconsistencies.
We propose an inference-time strategy for fors called Ensemble of Conditionals.
arXiv Detail & Related papers (2022-12-30T22:53:25Z) - Exposing the Implicit Energy Networks behind Masked Language Models via
Metropolis--Hastings [57.133639209759615]
We interpret sequences as energy-based sequence models and propose two energy parametrizations derivable from traineds.
We develop a tractable emph scheme based on the Metropolis-Hastings Monte Carlo algorithm.
We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models.
arXiv Detail & Related papers (2021-06-04T22:04:30Z) - Universal Sentence Representation Learning with Conditional Masked
Language Model [7.334766841801749]
We present Conditional Masked Language Modeling (M) to effectively learn sentence representations.
Our English CMLM model achieves state-of-the-art performance on SentEval.
As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
arXiv Detail & Related papers (2020-12-28T18:06:37Z) - Encoder-Decoder Models Can Benefit from Pre-trained Masked Language
Models in Grammatical Error Correction [54.569707226277735]
Previous methods have potential drawbacks when applied to an EncDec model.
Our proposed method fine-tune a corpus and then use the output fine-tuned as additional features in the GEC model.
The best-performing model state-of-the-art performances on the BEA 2019 and CoNLL-2014 benchmarks.
arXiv Detail & Related papers (2020-05-03T04:49:31Z) - Probabilistically Masked Language Model Capable of Autoregressive
Generation in Arbitrary Word Order [32.71489048856101]
Masked language model and autoregressive language model are two types of language models.
We propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM)
We prove that u-PMLM is equivalent to an autoregressive permutated language model.
arXiv Detail & Related papers (2020-04-24T07:38:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.