Nonparametric Masked Language Modeling
- URL: http://arxiv.org/abs/2212.01349v2
- Date: Thu, 25 May 2023 23:18:35 GMT
- Title: Nonparametric Masked Language Modeling
- Authors: Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh
Hajishirzi, Luke Zettlemoyer
- Abstract summary: Existing language models (LMs) predict tokens with a softmax over a finite vocabulary.
We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus.
NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval.
- Score: 113.71921977520864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing language models (LMs) predict tokens with a softmax over a finite
vocabulary, which can make it difficult to predict rare tokens or phrases. We
introduce NPM, the first nonparametric masked language model that replaces this
softmax with a nonparametric distribution over every phrase in a reference
corpus. NPM fills in the [MASK] solely from retrieving a token from a text
corpus. We show that NPM can be efficiently trained with a contrastive
objective and an in-batch approximation to full corpus retrieval. Zero-shot
evaluation on 16 tasks including classification, fact probing and question
answering demonstrates that NPM outperforms significantly larger parametric
models, either with or without a retrieve-and-generate approach. It is
particularly better at dealing with rare patterns (word senses or facts) and
predicting rare or nearly unseen words (e.g., non-Latin script). We release the
model and code at github.com/facebookresearch/NPM.
Related papers
- Parallel Corpus Augmentation using Masked Language Models [2.3020018305241337]
We use Multi-Lingual Masked Language Model to mask and predict alternative words in context.
We use Sentence Embeddings to check and select sentence pairs which are likely to be translations of each other.
arXiv Detail & Related papers (2024-10-04T07:15:07Z) - Understanding and Mitigating Tokenization Bias in Language Models [6.418593476658017]
State-of-the-art language models are autoregressive and operate on subword units known as tokens.
We show that popular encoding schemes induce a sampling bias that cannot be mitigated with more training or data.
We propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data.
arXiv Detail & Related papers (2024-06-24T17:38:02Z) - Contextual Distortion Reveals Constituency: Masked Language Models are
Implicit Parsers [7.558415495951758]
We propose a novel method for extracting parse trees from masked language models (LMs)
Our method computes a score for each span based on the distortion of contextual representations resulting from linguistic perturbations.
Our method consistently outperforms previous state-of-the-art methods on English with masked LMs, and also demonstrates superior performance in a multilingual setting.
arXiv Detail & Related papers (2023-06-01T13:10:48Z) - Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences.
We introduce denoising word alignment as a new cross-lingual pre-training task.
Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion
Recognition [0.0]
We analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER)
We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML.
This modification performs better than the original MAML and outperforms on EmoFilm dataset.
arXiv Detail & Related papers (2021-01-05T05:51:50Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.