Suffix Retrieval-Augmented Language Modeling
- URL: http://arxiv.org/abs/2211.03053v1
- Date: Sun, 6 Nov 2022 07:53:19 GMT
- Title: Suffix Retrieval-Augmented Language Modeling
- Authors: Zecheng Wang and Yik-Cheung Tam
- Abstract summary: Causal language modeling (LM) uses word history to predict the next word.
BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions.
We propose a novel model that simulates a bi-directional contextual effect in an autoregressive manner.
- Score: 1.8710230264817358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal language modeling (LM) uses word history to predict the next word.
BERT, on the other hand, makes use of bi-directional word information in a
sentence to predict words at masked positions. While BERT is effective in
sequence encoding, it is non-causal by nature and is not designed for sequence
generation. In this paper, we propose a novel language model, SUffix
REtrieval-Augmented LM (SUREALM), that simulates a bi-directional contextual
effect in an autoregressive manner. SUREALM employs an embedding retriever to
search for training sentences in a data store that share similar word history
during sequence generation. In particular, the suffix portions of the retrieved
sentences mimick the "future" context. We evaluated our proposed model on the
DSTC9 spoken dialogue corpus and showed promising word perplexity reduction on
the validation and test set compared to competitive baselines.
Related papers
- Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Word Order Does Matter (And Shuffled Language Models Know It) [9.990431777927421]
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE.
We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
arXiv Detail & Related papers (2022-03-21T14:10:15Z) - Dict-BERT: Enhancing Language Model Pre-training with Dictionary [42.0998323292348]
Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora.
In this work, we focus on enhancing language model pre-training by leveraging definitions of rare words in dictionaries.
We propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions.
arXiv Detail & Related papers (2021-10-13T04:29:14Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - Using BERT Encoding and Sentence-Level Language Model for Sentence
Ordering [0.9134244356393667]
We propose an algorithm for sentence ordering in a corpus of short stories.
Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism.
The proposed model includes three components: Sentence, Language Model, and Sentence Arrangement with Brute Force Search.
arXiv Detail & Related papers (2021-08-24T23:03:36Z) - Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences.
We introduce denoising word alignment as a new cross-lingual pre-training task.
Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z) - CharBERT: Character-aware Pre-trained Language Model [36.9333890698306]
We propose a character-aware pre-trained language model named CharBERT.
We first construct the contextual word embedding for each token from the sequential character representations.
We then fuse the representations of characters and the subword representations by a novel heterogeneous interaction module.
arXiv Detail & Related papers (2020-11-03T07:13:06Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.