Related papers: Deriving Language Models from Masked Language Models

Deriving Language Models from Masked Language Models

URL: http://arxiv.org/abs/2305.15501v1
Date: Wed, 24 May 2023 18:42:45 GMT
Title: Deriving Language Models from Masked Language Models
Authors: Lucas Torroba Hennigen, Yoon Kim
Abstract summary: Masked language models (MLM) do not explicitly define a distribution over language. Recent work has implicitly treated them as such for the purposes of generation and scoring.
Score: 12.628196757545979
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Masked language models (MLM) do not explicitly define a distribution over language, i.e., they are not language models per se. However, recent work has implicitly treated them as such for the purposes of generation and scoring. This paper studies methods for deriving explicit joint distributions from MLMs, focusing on distributions over two tokens, which makes it possible to calculate exact distributional properties. We find that an approach based on identifying joints whose conditionals are closest to those of the MLM works well and outperforms existing Markov random field-based approaches. We further find that this derived model's conditionals can even occasionally outperform the original MLM's conditionals.

Related papers

Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs) We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length. PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing. We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities. To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z)
FiLM: Fill-in Language Models for Any-Order Generation [71.42044325886194]
Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order. During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs. FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
arXiv Detail & Related papers (2023-10-15T19:37:39Z)
Inconsistencies in Masked Language Models [20.320583166619528]
Masked language models (MLMs) can provide distributions of tokens in the masked positions in a sequence. distributions corresponding to different masking patterns can demonstrate considerable inconsistencies. We propose an inference-time strategy for fors called Ensemble of Conditionals.
arXiv Detail & Related papers (2022-12-30T22:53:25Z)
Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling [95.9542389945259]
Sentence scoring aims at measuring the likelihood of a sentence and is widely used in many natural language processing scenarios. We propose textitTranscormer -- a Transformer model with a novel textitsliding language modeling (SLM) for sentence scoring.
arXiv Detail & Related papers (2022-05-25T18:00:09Z)
Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis--Hastings [57.133639209759615]
We interpret sequences as energy-based sequence models and propose two energy parametrizations derivable from traineds. We develop a tractable emph scheme based on the Metropolis-Hastings Monte Carlo algorithm. We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models.
arXiv Detail & Related papers (2021-06-04T22:04:30Z)
Universal Sentence Representation Learning with Conditional Masked Language Model [7.334766841801749]
We present Conditional Masked Language Modeling (M) to effectively learn sentence representations. Our English CMLM model achieves state-of-the-art performance on SentEval. As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
arXiv Detail & Related papers (2020-12-28T18:06:37Z)
Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction [54.569707226277735]
Previous methods have potential drawbacks when applied to an EncDec model. Our proposed method fine-tune a corpus and then use the output fine-tuned as additional features in the GEC model. The best-performing model state-of-the-art performances on the BEA 2019 and CoNLL-2014 benchmarks.
arXiv Detail & Related papers (2020-05-03T04:49:31Z)
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order [32.71489048856101]
Masked language model and autoregressive language model are two types of language models. We propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM) We prove that u-PMLM is equivalent to an autoregressive permutated language model.
arXiv Detail & Related papers (2020-04-24T07:38:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.