Blank Collapse: Compressing CTC emission for the faster decoding
- URL: http://arxiv.org/abs/2210.17017v2
- Date: Tue, 27 Jun 2023 00:39:38 GMT
- Title: Blank Collapse: Compressing CTC emission for the faster decoding
- Authors: Minkyu Jung, Ohhyeok Kwon, Seunghyun Seo, Soonshin Seo
- Abstract summary: We propose a method to reduce the amount of calculation resulting in faster beam search decoding speed.
With this method, we can get up to 78% faster decoding speed than ordinary beam search decoding.
- Score: 0.30108936184913293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Connectionist Temporal Classification (CTC) model is a very efficient method
for modeling sequences, especially for speech data. In order to use CTC model
as an Automatic Speech Recognition (ASR) task, the beam search decoding with an
external language model like n-gram LM is necessary to obtain reasonable
results. In this paper we analyze the blank label in CTC beam search deeply and
propose a very simple method to reduce the amount of calculation resulting in
faster beam search decoding speed. With this method, we can get up to 78%
faster decoding speed than ordinary beam search decoding with a very small loss
of accuracy in LibriSpeech datasets. We prove this method is effective not only
practically by experiments but also theoretically by mathematical reasoning. We
also observe that this reduction is more obvious if the accuracy of the model
is higher.
Related papers
- NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding [54.88765757043535]
This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference.<n>Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types with less than 7% computational overhead.<n>The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search.
arXiv Detail & Related papers (2025-05-28T20:43:10Z) - Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z) - Let the Code LLM Edit Itself When You Edit the Code [50.46536185784169]
underlinetextbfPositional textbfIntegrity textbfEncoding (PIE)
PIE reduces computational overhead by over 85% compared to the standard full recomputation approach.
Results demonstrate that PIE reduces computational overhead by over 85% compared to the standard full recomputation approach.
arXiv Detail & Related papers (2024-07-03T14:34:03Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech
Recognition [1.2680687621338012]
Connectionist Temporal Classification ( CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines.
We introduce a GPU-accelerated Weighted Finite State Transducer (WFST) beam decoder compatible with current CTC models.
It increases pipeline throughput and decreases latency, supports streaming inference, and also supports advanced features like utterance-specific word boosting via on-the-fly composition.
arXiv Detail & Related papers (2023-11-08T19:57:10Z) - A Token-Wise Beam Search Algorithm for RNN-T [3.682821163882332]
We present a decoding beam search algorithm that batches the joint network calls across a segment of time steps.
In addition, aggregating emission probabilities over a segment may be seen as a better approximation to finding the most likely model output.
arXiv Detail & Related papers (2023-02-28T07:20:49Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - Adding Connectionist Temporal Summarization into Conformer to Improve
Its Decoder Efficiency For Speech Recognition [22.61761934996406]
We propose a novel connectionist temporal summarization (CTS) method that reduces the number of frames required for the attention decoder.
With a beamwidth of 4, the LibriSpeech's decoding budget can be reduced by up to 20%.
The word error rate (WER) is reduced by 6% relative at the beam width of 1 and by 3% relative at the beam width of 4.
arXiv Detail & Related papers (2022-04-08T07:24:00Z) - Cascaded Fast and Slow Models for Efficient Semantic Code Search [46.53530668938728]
We propose an efficient and accurate semantic code search framework with cascaded fast and slow models.
The proposed cascaded approach is not only efficient and scalable, but also achieves state-of-the-art results.
arXiv Detail & Related papers (2021-10-15T02:23:35Z) - Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.
Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z) - FSR: Accelerating the Inference Process of Transducer-Based Models by
Applying Fast-Skip Regularization [72.9385528828306]
A typical transducer model decodes the output sequence conditioned on the current acoustic state.
The number of blank tokens in the prediction results accounts for nearly 90% of all tokens.
We propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model.
arXiv Detail & Related papers (2021-04-07T03:15:10Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z) - End-to-end Sinkhorn Autoencoder with Noise Generator [10.008055997630304]
We propose a novel end-to-end sinkhorn autoencoder with noise generator for efficient data collection simulation.
Our method outperforms competing approaches on a challenging dataset of simulation data from Zero Degree Calorimeters of ALICE experiment in LHC.
arXiv Detail & Related papers (2020-06-11T18:04:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.