Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
- URL: http://arxiv.org/abs/2210.07499v1
- Date: Fri, 14 Oct 2022 03:55:36 GMT
- Title: Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
- Authors: Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji
Watanabe
- Abstract summary: Bayes risk CTC (BRCTC) is proposed to enforce the desired characteristics of the predicted alignment.
By using BRCTC with another preference for early emissions, we obtain an improved performance-latency trade-off for online models.
- Score: 63.189632935619535
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a
target sequence. The Connectionist Temporal Classification (CTC) criterion is
widely used in multiple seq2seq tasks. Besides predicting the target sequence,
a side product of CTC is to predict the alignment, which is the most probable
input-long sequence that specifies a hard aligning relationship between the
input and target units. As there are multiple potential aligning sequences
(called paths) that are equally considered in CTC formulation, the choice of
which path will be most probable and become the predicted alignment is always
uncertain. In addition, it is usually observed that the alignment predicted by
vanilla CTC will drift compared with its reference and rarely provides
practical functionalities. Thus, the motivation of this work is to make the CTC
alignment prediction controllable and thus equip CTC with extra
functionalities. The Bayes risk CTC (BRCTC) criterion is then proposed in this
work, in which a customizable Bayes risk function is adopted to enforce the
desired characteristics of the predicted alignment. With the risk function, the
BRCTC is a general framework to adopt some customizable preference over the
paths in order to concentrate the posterior into a particular subset of the
paths. In applications, we explore one particular preference which yields
models with the down-sampling ability and reduced inference costs. By using
BRCTC with another preference for early emissions, we obtain an improved
performance-latency trade-off for online models. Experimentally, the proposed
BRCTC reduces the inference cost of offline models by up to 47% without
performance degradation and cuts down the overall latency of online systems to
an unseen level.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Cross-Validation Conformal Risk Control [40.2365781482563]
Conformal risk control (CRC) is a recently proposed technique that applies post-hoc to a conventional point predictor to provide calibration guarantees.
In this paper, a novel CRC method is introduced that is based on cross-validation, rather than on validation as the original CRC.
CV-CRC is proved to offer theoretical guarantees on the average risk of the set predictor.
arXiv Detail & Related papers (2024-01-22T14:26:02Z) - Forking Uncertainties: Reliable Prediction and Model Predictive Control
with Sequence Models via Conformal Risk Control [40.918012779935246]
We introduce a novel post-hoc calibration procedure that operates on the predictions produced by any pre-designed probabilistic forecaster to yield reliable error bars.
Unlike the state of the art, PTS-CRC can satisfy reliability definitions beyond coverage.
We experimentally validate the performance of PTS-CRC prediction and control by studying a number of use cases in the context of wireless networking.
arXiv Detail & Related papers (2023-10-16T11:35:41Z) - Align With Purpose: Optimize Desired Properties in CTC Models with a
General Plug-and-Play Framework [8.228892600588765]
Connectionist Temporal Classification ( CTC) is a widely used criterion for training sequence-to-sequence (seq2seq) models.
We propose $textitAlign With Purpose, a $textbfgeneral Plug-and-Play framework for enhancing a desired property in models trained with the CTC criterion.
We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset.
arXiv Detail & Related papers (2023-07-04T13:34:47Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process.
We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z) - Alignment Knowledge Distillation for Online Streaming Attention-based
Speech Recognition [46.69852287267763]
This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems.
The proposed method significantly reduces recognition errors and emission latency simultaneously.
The best MoChA system shows performance comparable to that of RNN-transducer (RNN-T)
arXiv Detail & Related papers (2021-02-28T08:17:38Z) - CTC-synchronous Training for Monotonic Attention Model [43.0382262234792]
backward probabilities cannot be leveraged in the alignment process during training due to left-to-right dependency in the decoder.
We propose CTC-synchronous training ( CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments.
The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments.
arXiv Detail & Related papers (2020-05-10T16:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.