Self-Normalized Importance Sampling for Neural Language Modeling
- URL: http://arxiv.org/abs/2111.06310v1
- Date: Thu, 11 Nov 2021 16:57:53 GMT
- Title: Self-Normalized Importance Sampling for Neural Language Modeling
- Authors: Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf
Schl\"uter, Hermann Ney
- Abstract summary: In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
- Score: 97.96857871187052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To mitigate the problem of having to traverse over the full vocabulary in the
softmax normalization of a neural language model, sampling-based training
criteria are proposed and investigated in the context of large vocabulary
word-based neural language models. These training criteria typically enjoy the
benefit of faster training and testing, at a cost of slightly degraded
performance in terms of perplexity and almost no visible drop in word error
rate. While noise contrastive estimation is one of the most popular choices,
recently we show that other sampling-based criteria can also perform well, as
long as an extra correction step is done, where the intended class posterior
probability is recovered from the raw model outputs. In this work, we propose
self-normalized importance sampling. Compared to our previous work, the
criteria considered in this work are self-normalized and there is no need to
further conduct a correction step. Compared to noise contrastive estimation,
our method is directly comparable in terms of complexity in application.
Through self-normalized language model training as well as lattice rescoring
experiments, we show that our proposed self-normalized importance sampling is
competitive in both research-oriented and production-oriented automatic speech
recognition tasks.
Related papers
- Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.