Related papers: Lightweight Transducer Based on Frame-Level Criterion

Lightweight Transducer Based on Frame-Level Criterion

URL: http://arxiv.org/abs/2409.13698v2
Date: Fri, 1 Nov 2024 06:08:08 GMT
Title: Lightweight Transducer Based on Frame-Level Criterion
Authors: Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye,
Abstract summary: We propose a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities. Experiments on the AISHELL-1 demonstrate that this enables the lightweight transducer to achieve similar results to transducer.
Score: 14.518972562566642
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix. We proposed a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. Then the encoder output can be combined with the decoder output at the corresponding time, rather than adding each element output by the encoder to each element output by the decoder as in the transducer. This significantly reduces memory and computation requirements. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities and truncate the gradient of the blank classifier to the main network. Experiments on the AISHELL-1 demonstrate that this enables the lightweight transducer to achieve similar results to transducer. Additionally, we use richer information to predict the probability of blank, achieving superior results to transducer.

Related papers

Turbo-Annihilation of Hook Errors in Stabilizer Measurement Circuits [2.6999000177990924]
We propose a scalable decoding framework for correcting correlated hook errors in stabilizer measurement circuits. Traditional circuit-level decoding attempts to estimate the precise location of faults by constructing an extended Tanner graph. Our approach instead focuses on estimating the effective data errors caused by hook faults, modeling them as memory channels.
arXiv Detail & Related papers (2025-04-29T22:09:11Z)
Threshold Selection for Iterative Decoding of $(v,w)$-regular Binary Codes [84.0257274213152]
Iterative bit flipping decoders are an efficient choice for sparse $(v,w)$-regular codes. We propose concrete criteria for threshold determination, backed by a closed form model.
arXiv Detail & Related papers (2025-01-23T17:38:22Z)
Cluster Decomposition for Improved Erasure Decoding of Quantum LDPC Codes [7.185960422285947]
We introduce a new erasure decoder that applies to arbitrary quantum LDPC codes. By allowing clusters of unconstrained size, this decoder achieves maximum-likelihood (ML) performance. For the general quantum LDPC codes we studied, the cluster decoder can be used to estimate the ML performance curve.
arXiv Detail & Related papers (2024-12-11T23:14:23Z)
The Conformer Encoder May Reverse the Time Dimension [53.9351497436903]
We analyze the initial behavior of the decoder cross-attention mechanism and find that it encourages the Conformer encoder self-attention to build a connection between the initial frames and all other informative frames. We propose several methods and ideas of how this flipping can be avoided.
arXiv Detail & Related papers (2024-10-01T13:39:05Z)
Label-Looping: Highly Efficient Decoding for Transducers [19.091932566833265]
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32.
arXiv Detail & Related papers (2024-06-10T12:34:38Z)
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation [67.85309547416155]
A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. Mask2Former uses 50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. We propose PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance.
arXiv Detail & Related papers (2024-04-23T01:34:20Z)
Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference [13.000030080938078]
computational cost of transformer models makes them inefficient in low-latency or low-power applications. We introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets.
arXiv Detail & Related papers (2023-12-15T20:39:43Z)
Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances. We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z)
Error-rate-agnostic decoding of topological stabilizer codes [0.0]
We develop a decoder that depends on the bias, i.e., the relative probability of phase-flip to bit-flip errors, but is agnostic to error rate. Our decoder is based on counting the number and effective weight of the most likely error chains in each equivalence class of a given syndrome.
arXiv Detail & Related papers (2021-12-03T15:45:12Z)
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization [72.9385528828306]
A typical transducer model decodes the output sequence conditioned on the current acoustic state. The number of blank tokens in the prediction results accounts for nearly 90% of all tokens. We propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model.
arXiv Detail & Related papers (2021-04-07T03:15:10Z)
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module. Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.