Efficient Sequence Training of Attention Models using Approximative
Recombination
- URL: http://arxiv.org/abs/2110.09245v1
- Date: Mon, 18 Oct 2021 12:47:53 GMT
- Title: Efficient Sequence Training of Attention Models using Approximative
Recombination
- Authors: Nils-Philipp Wynands and Wilfried Michel and Jan Rosendahl and Ralf
Schl\"uter and Hermann Ney
- Abstract summary: Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system.
It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice.
This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history.
- Score: 44.501712281337205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence discriminative training is a great tool to improve the performance
of an automatic speech recognition system. It does, however, necessitate a sum
over all possible word sequences, which is intractable to compute in practice.
Current state-of-the-art systems with unlimited label context circumvent this
problem by limiting the summation to an n-best list of relevant competing
hypotheses obtained from beam search.
This work proposes to perform (approximative) recombinations of hypotheses
during beam search, if they share a common local history. The error that is
incurred by the approximation is analyzed and it is shown that using this
technique the effective beam size can be increased by several orders of
magnitude without significantly increasing the computational requirements.
Lastly, it is shown that this technique can be used to effectively perform
sequence discriminative training for attention-based encoder-decoder acoustic
models on the LibriSpeech task.
Related papers
- Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Improved Beam Search for Hallucination Mitigation in Abstractive
Summarization [1.2328446298523066]
In this paper, we investigate the use of the Natural Language Inference (NLI) entailment metric to detect and prevent hallucinations in summary generation.
We propose an NLI-assisted beam re-ranking mechanism by computing entailment probability scores between the input context and summarization model-generated beams.
Our proposed algorithm significantly outperforms vanilla beam decoding on XSum and CNN/DM datasets.
arXiv Detail & Related papers (2022-12-06T02:33:47Z) - A New Sentence Ordering Method Using BERT Pretrained Model [2.1793134762413433]
We propose a method for sentence ordering which does not need a training phase and consequently a large corpus for learning.
Our proposed method outperformed other baselines on ROCStories, a corpus of 5-sentence human-made stories.
Among other advantages of this method are its interpretability and needlessness to linguistic knowledge.
arXiv Detail & Related papers (2021-08-26T18:47:15Z) - Automatic Vocabulary and Graph Verification for Accurate Loop Closure
Detection [21.862978912891677]
Bag-of-Words (BoW) builds a visual vocabulary to associate features and then detect loops.
We propose a natural convergence criterion based on the comparison between the radii of nodes and the drifts of feature descriptors.
We present a novel topological graph verification method for validating candidate loops.
arXiv Detail & Related papers (2021-07-30T13:19:33Z) - Scalable Optimal Classifiers for Adversarial Settings under Uncertainty [10.90668635921398]
We consider the problem of finding optimal classifiers in an adversarial setting where the class-1 data is generated by an attacker whose objective is not known to the defender.
We show that this low-dimensional characterization enables to develop a training method to compute provably approximately optimal classifiers in a scalable manner.
arXiv Detail & Related papers (2021-06-28T13:33:53Z) - Out-of-Scope Intent Detection with Self-Supervision and Discriminative
Training [20.242645823965145]
Out-of-scope intent detection is of practical importance in task-oriented dialogue systems.
We propose a method to train an out-of-scope intent classifier in a fully end-to-end manner by simulating the test scenario in training.
We evaluate our method extensively on four benchmark dialogue datasets and observe significant improvements over state-of-the-art approaches.
arXiv Detail & Related papers (2021-06-16T08:17:18Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.