Related papers: Efficient Sequence Training of Attention Models using Approximative Recombination

Efficient Sequence Training of Attention Models using Approximative Recombination

URL: http://arxiv.org/abs/2110.09245v1
Date: Mon, 18 Oct 2021 12:47:53 GMT
Title: Efficient Sequence Training of Attention Models using Approximative Recombination
Authors: Nils-Philipp Wynands and Wilfried Michel and Jan Rosendahl and Ralf Schl\"uter and Hermann Ney
Abstract summary: Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history.
Score: 44.501712281337205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. Current state-of-the-art systems with unlimited label context circumvent this problem by limiting the summation to an n-best list of relevant competing hypotheses obtained from beam search. This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history. The error that is incurred by the approximation is analyzed and it is shown that using this technique the effective beam size can be increased by several orders of magnitude without significantly increasing the computational requirements. Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.

Related papers

Incremental Sequence Classification with Temporal Consistency [9.65650774513798]
We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed.<n>We leverage a temporal-consistency condition that successive predictions should satisfy to develop a novel loss function for training incremental sequence classifiers.<n>Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.
arXiv Detail & Related papers (2025-05-22T11:37:53Z)
Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space. Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z)
A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text. Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Improved Beam Search for Hallucination Mitigation in Abstractive Summarization [1.2328446298523066]
In this paper, we investigate the use of the Natural Language Inference (NLI) entailment metric to detect and prevent hallucinations in summary generation. We propose an NLI-assisted beam re-ranking mechanism by computing entailment probability scores between the input context and summarization model-generated beams. Our proposed algorithm significantly outperforms vanilla beam decoding on XSum and CNN/DM datasets.
arXiv Detail & Related papers (2022-12-06T02:33:47Z)
A New Sentence Ordering Method Using BERT Pretrained Model [2.1793134762413433]
We propose a method for sentence ordering which does not need a training phase and consequently a large corpus for learning. Our proposed method outperformed other baselines on ROCStories, a corpus of 5-sentence human-made stories. Among other advantages of this method are its interpretability and needlessness to linguistic knowledge.
arXiv Detail & Related papers (2021-08-26T18:47:15Z)
Automatic Vocabulary and Graph Verification for Accurate Loop Closure Detection [21.862978912891677]
Bag-of-Words (BoW) builds a visual vocabulary to associate features and then detect loops. We propose a natural convergence criterion based on the comparison between the radii of nodes and the drifts of feature descriptors. We present a novel topological graph verification method for validating candidate loops.
arXiv Detail & Related papers (2021-07-30T13:19:33Z)
Scalable Optimal Classifiers for Adversarial Settings under Uncertainty [10.90668635921398]
We consider the problem of finding optimal classifiers in an adversarial setting where the class-1 data is generated by an attacker whose objective is not known to the defender. We show that this low-dimensional characterization enables to develop a training method to compute provably approximately optimal classifiers in a scalable manner.
arXiv Detail & Related papers (2021-06-28T13:33:53Z)
Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training [20.242645823965145]
Out-of-scope intent detection is of practical importance in task-oriented dialogue systems. We propose a method to train an out-of-scope intent classifier in a fully end-to-end manner by simulating the test scenario in training. We evaluate our method extensively on four benchmark dialogue datasets and observe significant improvements over state-of-the-art approaches.
arXiv Detail & Related papers (2021-06-16T08:17:18Z)
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework. We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.