Learning a Word-Level Language Model with Sentence-Level Noise
Contrastive Estimation for Contextual Sentence Probability Estimation
- URL: http://arxiv.org/abs/2103.07875v1
- Date: Sun, 14 Mar 2021 09:17:37 GMT
- Title: Learning a Word-Level Language Model with Sentence-Level Noise
Contrastive Estimation for Contextual Sentence Probability Estimation
- Authors: Heewoong Park, Sukhyun Cho, Jonghun Park
- Abstract summary: Inferring the probability distribution of sentences or word sequences is a key process in natural language processing.
While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences, they have difficulty capturing a context long enough for sentence probability estimation (SPE)
Recent studies introduced training methods using sentence-level noise-contrastive estimation (NCE) with recurrent neural networks (RNNs)
We apply our method to a simple word-level RNN LM to focus on the effect of the sentence-level NCE training rather than on the network architecture.
- Score: 3.1040192682787415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inferring the probability distribution of sentences or word sequences is a
key process in natural language processing. While word-level language models
(LMs) have been widely adopted for computing the joint probabilities of word
sequences, they have difficulty in capturing a context long enough for sentence
probability estimation (SPE). To overcome this, recent studies introduced
training methods using sentence-level noise-contrastive estimation (NCE) with
recurrent neural networks (RNNs). In this work, we attempt to extend it for
contextual SPE, which aims to estimate a conditional sentence probability given
a previous text. The proposed NCE samples negative sentences independently of a
previous text so that the trained model gives higher probabilities to the
sentences that are more consistent with \textcolor{blue}{the} context. We apply
our method to a simple word-level RNN LM to focus on the effect of the
sentence-level NCE training rather than on the network architecture. The
quality of estimation was evaluated against multiple-choice cloze-style
questions including both human and automatically generated questions. The
experimental results show that the proposed method improved the SPE quality for
the word-level RNN LM.
Related papers
- Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - A Simple Contrastive Learning Objective for Alleviating Neural Text
Degeneration [56.64703901898937]
We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training.
Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts.
arXiv Detail & Related papers (2022-05-05T08:50:50Z) - Detecting Textual Adversarial Examples Based on Distributional
Characteristics of Data Representations [11.93653349589025]
adversarial examples are constructed by adding small non-random perturbations to correctly classified inputs.
Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, or phrase-level perturbations.
We propose two new reactive methods for NLP to fill this gap.
Adapted LID and MDRE obtain state-of-the-art results on character-level, word-level, and phrase-level attacks on the IMDB dataset.
arXiv Detail & Related papers (2022-04-29T02:32:02Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Probing BERT's priors with serial reproduction chains [8.250374560598493]
We use serial reproduction chains to probe BERT's priors.
A unique and consistent estimator of the ground-truth joint distribution may be obtained.
We compare the lexical and syntactic statistics of sentences from the resulting prior distribution against those of the ground-truth corpus distribution.
arXiv Detail & Related papers (2022-02-24T17:42:28Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - A New Sentence Ordering Method Using BERT Pretrained Model [2.1793134762413433]
We propose a method for sentence ordering which does not need a training phase and consequently a large corpus for learning.
Our proposed method outperformed other baselines on ROCStories, a corpus of 5-sentence human-made stories.
Among other advantages of this method are its interpretability and needlessness to linguistic knowledge.
arXiv Detail & Related papers (2021-08-26T18:47:15Z) - $k$-Neighbor Based Curriculum Sampling for Sequence Prediction [22.631763991832862]
Multi-step ahead prediction in language models is challenging due to discrepancy between training and test time processes.
We propose textitNearest-Neighbor Replacement Sampling -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy.
We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.
arXiv Detail & Related papers (2021-01-22T20:07:29Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - An Investigation of Language Model Interpretability via Sentence Editing [5.492504126672887]
We re-purpose a sentence editing dataset as a testbed for interpretability of pre-trained language models (PLMs)
This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability.
The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales.
arXiv Detail & Related papers (2020-11-28T00:46:43Z) - Toward Better Storylines with Sentence-Level Language Models [54.91921545103256]
We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives.
We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task.
arXiv Detail & Related papers (2020-05-11T16:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.