Fast and parallel decoding for transducer
- URL: http://arxiv.org/abs/2211.00484v1
- Date: Mon, 31 Oct 2022 07:46:10 GMT
- Title: Fast and parallel decoding for transducer
- Authors: Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei
Yao, Xiaoyu Yang, Piotr \.Zelasko, Daniel Povey
- Abstract summary: We introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences.
We also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step.
- Score: 25.510837666148024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The transducer architecture is becoming increasingly popular in the field of
speech recognition, because it is naturally streaming as well as high in
accuracy. One of the drawbacks of transducer is that it is difficult to decode
in a fast and parallel way due to an unconstrained number of symbols that can
be emitted per time step. In this work, we introduce a constrained version of
transducer loss to learn strictly monotonic alignments between the sequences;
we also improve the standard greedy search and beam search algorithms by
limiting the number of symbols that can be emitted per time step in transducer
decoding, making it more efficient to decode in parallel with batches.
Furthermore, we propose an finite state automaton-based (FSA) parallel beam
search algorithm that can run with graphs on GPU efficiently. The experiment
results show that we achieve slight word error rate (WER) improvement as well
as significant speedup in decoding. Our work is open-sourced and publicly
available\footnote{https://github.com/k2-fsa/icefall}.
Related papers
- Localized statistics decoding: A parallel decoding algorithm for quantum low-density parity-check codes [3.001631679133604]
We introduce localized statistics decoding for arbitrary quantum low-density parity-check codes.
Our decoder is more amenable to implementation on specialized hardware, positioning it as a promising candidate for decoding real-time syndromes from experiments.
arXiv Detail & Related papers (2024-06-26T18:00:09Z) - Label-Looping: Highly Efficient Decoding for Transducers [19.091932566833265]
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models.
Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32.
arXiv Detail & Related papers (2024-06-10T12:34:38Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Break the Sequential Dependency of LLM Inference Using Lookahead
Decoding [27.87483106859749]
Lookahead decoding is an exact, parallel decoding algorithm for large language models (LLMs)
Our implementation can speed up autoregressive decoding by up to 1.8x on MT-bench and 4x with strong scaling on multiple GPUs in code completion tasks.
arXiv Detail & Related papers (2024-02-03T06:37:50Z) - Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding.
We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z) - Accelerating Transformer Inference for Translation via Parallel Decoding [2.89306442817912]
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT)
We present three parallel decoding algorithms and test them on different languages and models.
arXiv Detail & Related papers (2023-05-17T17:57:34Z) - Parallel window decoding enables scalable fault tolerant quantum
computation [2.624902795082451]
We present a methodology that parallelizes the decoding problem and achieves almost arbitrary syndrome processing speed.
Our parallelization requires some classical feedback decisions to be delayed, leading to a slow-down of the logical clock speed.
Using known auto-teleportation gadgets the slow-down can be eliminated altogether in exchange for increased qubit overhead.
arXiv Detail & Related papers (2022-09-18T12:37:57Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.
Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z) - Instantaneous Grammatical Error Correction with Shallow Aggressive
Decoding [57.08875260900373]
We propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC)
SAD aggressively decodes as many tokens as possible in parallel instead of always decoding only one token in each step to improve computational parallelism.
Experiments in both English and Chinese GEC benchmarks show that aggressive decoding could yield the same predictions but with a significant speedup for online inference.
arXiv Detail & Related papers (2021-06-09T10:30:59Z) - Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder.
Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.