Related papers: Label-Looping: Highly Efficient Decoding for Transducers

Label-Looping: Highly Efficient Decoding for Transducers

URL: http://arxiv.org/abs/2406.06220v2
Date: Mon, 16 Sep 2024 19:04:28 GMT
Title: Label-Looping: Highly Efficient Decoding for Transducers
Authors: Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg,
Abstract summary: This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32.
Score: 19.091932566833265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over labels, while the inner loop iterates over frames searching for the next non-blank symbol. Additionally, we represent partial hypotheses in a special structure using CUDA tensors, supporting parallelized hypotheses manipulations. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32. It can be further combined with other compiler or GPU call-related techniques to achieve even more speedup. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. We open-source our implementation to benefit the research community.

Related papers

Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z)
CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes [14.631435001491514]
We propose a novel Cross-attention Message-Passing Transformer (CrossMPT) We show that CrossMPT significantly outperforms existing neural network-based decoders for various code classes. Notably, CrossMPT achieves this decoding performance improvement, while significantly reducing the memory usage, complexity, inference time, and training time.
arXiv Detail & Related papers (2024-05-02T06:30:52Z)
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers [76.13755422671822]
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework. We introduce a novel Attention-to-Mask (atm) module to design a lightweight decoder effective for plain ViT. Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5%$ of the computational cost.
arXiv Detail & Related papers (2023-06-09T22:29:56Z)
Fast and parallel decoding for transducer [25.510837666148024]
We introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences. We also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step.
arXiv Detail & Related papers (2022-10-31T07:46:10Z)
Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder. Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z)
Nearest neighbor search with compact codes: A decoder perspective [77.60612610421101]
We re-interpret popular methods such as binary hashing or product quantizers as auto-encoders. We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes.
arXiv Detail & Related papers (2021-12-17T15:22:28Z)
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance. Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z)
Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z)
Term Revealing: Furthering Quantization at Run Time on Quantized DNNs [9.240133036531402]
We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. We show an FPGA implementation that can use a small number of control bits to switch between conventional quantization and TR-enabled quantization with a negligible delay.
arXiv Detail & Related papers (2020-07-13T14:03:10Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
Pruning Neural Belief Propagation Decoders [77.237958592189]
We introduce a method to tailor an overcomplete parity-check matrix to (neural) BP decoding using machine learning. We achieve performance within 0.27 dB and 1.5 dB of the ML performance while reducing the complexity of the decoder.
arXiv Detail & Related papers (2020-01-21T12:05:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.