CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
Recognition
- URL: http://arxiv.org/abs/2307.14132v3
- Date: Fri, 15 Dec 2023 04:13:45 GMT
- Title: CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
Recognition
- Authors: Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Jiaming Zhou, Baoxiang Li
- Abstract summary: We propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment.
CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.
- Score: 8.302549684364195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve
length alignment between input audio and target sequence. However, the
implementation complexity and the alignment-based optimization target of RNN-T
loss lead to computational redundancy and a reduced role for predictor network,
respectively. In this paper, we propose a novel model named CIF-Transducer
(CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism
with the RNN-T model to achieve efficient alignment. In this way, the RNN-T
loss is abandoned, thus bringing a computational reduction and allowing the
predictor network a more significant role. We also introduce Funnel-CIF,
Context Blocks, Unified Gating and Bilinear Pooling joint network, and
auxiliary training strategy to further improve performance. Experiments on the
178-hour AISHELL-1 and 10000-hour WenetSpeech datasets show that CIF-T achieves
state-of-the-art results with lower computational overhead compared to RNN-T
models.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Powerful and Extensible WFST Framework for RNN-Transducer Losses [71.56212119508551]
This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss.
Existing implementations of RNN-T use-related code, which is hard to extend and debug.
We introduce two WFST-powered RNN-T implementations: "Compose-Transducer" and "Grid-Transducer"
arXiv Detail & Related papers (2023-03-18T10:36:33Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Alignment Restricted Streaming Recurrent Neural Network Transducer [29.218353627837214]
We propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T models.
The Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER)
The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency.
arXiv Detail & Related papers (2020-11-05T19:38:54Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Effect of Architectures and Training Methods on the Performance of
Learned Video Frame Prediction [10.404162481860634]
Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity.
The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure.
arXiv Detail & Related papers (2020-08-13T20:45:28Z) - SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural
Network [17.928105470385614]
We propose an intelligent tiled-based mechanism for increasing the adaptiveness of RNN, in order to efficiently handle the data dependencies.
Sharp achieves 2x, 2.8x, and 82x speedups on average, considering different RNN models and resource budgets.
arXiv Detail & Related papers (2019-11-04T14:51:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.