CTC Variations Through New WFST Topologies
- URL: http://arxiv.org/abs/2110.03098v1
- Date: Wed, 6 Oct 2021 23:00:15 GMT
- Title: CTC Variations Through New WFST Topologies
- Authors: Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg
- Abstract summary: This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition.
Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with epsilon> back-off transitions; (2) the "minimal-CTC", that only adds blank> self-loops when used in WFST-composition; and (3) "selfless-CTC", that disallows self-loop for non-blank units.
- Score: 79.94035631317395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents novel Weighted Finite-State Transducer (WFST) topologies
to implement Connectionist Temporal Classification (CTC)-like algorithms for
automatic speech recognition. Three new CTC variants are proposed: (1) the
"compact-CTC", in which direct transitions between units are replaced with
<epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank>
self-loops when used in WFST-composition; and (3) "selfless-CTC", that
disallows self-loop for non-blank units. The new CTC variants have several
benefits, such as reducing decoding graph size and GPU memory required for
training while keeping model accuracy.
Related papers
- CR-CTC: Consistency regularization on CTC for improved speech recognition [18.996929774821822]
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR)
However, it often falls short in recognition performance compared to transducer or systems combining CTC and attention-based encoder-decoder (CTC/AED)
We propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram.
arXiv Detail & Related papers (2024-10-07T14:56:07Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Pre-training for Speech Translation: CTC Meets Optimal Transport [29.807861658249923]
We show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design.
We propose a novel pre-training method combining CTC and optimal transport to further reduce this gap.
Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space.
arXiv Detail & Related papers (2023-01-27T14:03:09Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z) - Improved Mask-CTC for Non-Autoregressive End-to-End ASR [49.192579824582694]
Recently proposed end-to-end ASR system based on mask-predict with connectionist temporal classification (CTC)
We propose to enhance the network architecture by employing a recently proposed architecture called Conformer.
Next, we propose new training and decoding methods by introducing auxiliary objective to predict the length of a partial target sequence.
arXiv Detail & Related papers (2020-10-26T01:22:35Z) - CTC-synchronous Training for Monotonic Attention Model [43.0382262234792]
backward probabilities cannot be leveraged in the alignment process during training due to left-to-right dependency in the decoder.
We propose CTC-synchronous training ( CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments.
The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments.
arXiv Detail & Related papers (2020-05-10T16:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.