Better Intermediates Improve CTC Inference
- URL: http://arxiv.org/abs/2204.00176v1
- Date: Fri, 1 Apr 2022 02:51:23 GMT
- Title: Better Intermediates Improve CTC Inference
- Authors: Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji
Watanabe, Yusuke Kida
- Abstract summary: The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation.
We then propose two new conditioning methods based on the new formulation.
Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC.
- Score: 37.68950144012098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a method for improved CTC inference with searched
intermediates and multi-pass conditioning. The paper first formulates
self-conditioned CTC as a probabilistic model with an intermediate prediction
as a latent representation and provides a tractable conditioning framework. We
then propose two new conditioning methods based on the new formulation: (1)
Searched intermediate conditioning that refines intermediate predictions with
beam-search, (2) Multi-pass conditioning that uses predictions of previous
inference for conditioning the next inference. These new approaches enable
better conditioning than the original self-conditioned CTC during inference and
improve the final performance. Experiments with the LibriSpeech dataset show
relative 3%/12% performance improvement at the maximum in test clean/other sets
compared to the original self-conditioned CTC.
Related papers
- Conformal Risk Minimization with Variance Reduction [37.74931189657469]
Conformal prediction (CP) is a distribution-free framework for achieving probabilistic guarantees on black-box models.
Recent research efforts have focused on optimizing CP efficiency during training.
We formalize this concept as the problem of conformal risk minimization.
arXiv Detail & Related papers (2024-11-03T21:48:15Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Hyperparameters in Continual Learning: A Reality Check [53.30082523545212]
Continual learning (CL) aims to train a model on a sequence of tasks while balancing the trade-off between plasticity (learning new tasks) and stability (retaining prior knowledge)
The dominantly adopted conventional evaluation protocol for CL algorithms selects the best hyper parameters in a given scenario and then evaluates the algorithms in the same scenario.
This protocol has significant shortcomings: it overestimates the CL capacity of algorithms and relies on unrealistic hyper parameter tuning.
We argue that the evaluation of CL algorithms should focus on assessing the generalizability of their CL capacity to unseen scenarios.
arXiv Detail & Related papers (2024-03-14T03:13:01Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Pre-training for Speech Translation: CTC Meets Optimal Transport [29.807861658249923]
We show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design.
We propose a novel pre-training method combining CTC and optimal transport to further reduce this gap.
Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space.
arXiv Detail & Related papers (2023-01-27T14:03:09Z) - Improving CTC-based ASR Models with Gated Interlayer Collaboration [9.930655347717932]
We present a Gated Interlayer Collaboration mechanism which introduces contextual information into the models.
We train the model with intermediate CTC losses calculated by the interlayer outputs of the model, in which the probability distributions of the intermediate layers naturally serve as soft label sequences.
arXiv Detail & Related papers (2022-05-25T03:21:27Z) - InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR [17.967459632339374]
We propose InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning.
The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions.
In experiments using augmentations simulating deletion, insertion, and substitution error, we confirmed that the trained model acquires robustness to each error.
arXiv Detail & Related papers (2022-04-01T02:51:21Z) - Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly
Supervised Semantic Segmentation [48.294903659573585]
In this paper, we propose to embed affinity learning of multi-stage approaches in a single-stage model.
A deep neural network is used to deliver comprehensive semantic information in the training phase.
Experiments are conducted on the PASCAL VOC 2012 dataset to evaluate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2021-08-03T07:48:33Z) - Relaxing the Conditional Independence Assumption of CTC-based ASR by
Conditioning on Intermediate Predictions [14.376418789524783]
We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer.
Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed.
arXiv Detail & Related papers (2021-04-06T18:00:03Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.