Powerful and Extensible WFST Framework for RNN-Transducer Losses
- URL: http://arxiv.org/abs/2303.10384v1
- Date: Sat, 18 Mar 2023 10:36:33 GMT
- Title: Powerful and Extensible WFST Framework for RNN-Transducer Losses
- Authors: Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg
- Abstract summary: This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss.
Existing implementations of RNN-T use-related code, which is hard to extend and debug.
We introduce two WFST-powered RNN-T implementations: "Compose-Transducer" and "Grid-Transducer"
- Score: 71.56212119508551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a framework based on Weighted Finite-State Transducers
(WFST) to simplify the development of modifications for RNN-Transducer (RNN-T)
loss. Existing implementations of RNN-T use CUDA-related code, which is hard to
extend and debug. WFSTs are easy to construct and extend, and allow debugging
through visualization. We introduce two WFST-powered RNN-T implementations: (1)
"Compose-Transducer", based on a composition of the WFST graphs from acoustic
and textual schema -- computationally competitive and easy to modify; (2)
"Grid-Transducer", which constructs the lattice directly for further
computations -- most compact, and computationally efficient. We illustrate the
ease of extensibility through introduction of a new W-Transducer loss -- the
adaptation of the Connectionist Temporal Classification with Wild Cards.
W-Transducer (W-RNNT) consistently outperforms the standard RNN-T in a
weakly-supervised data setup with missing parts of transcriptions at the
beginning and end of utterances. All RNN-T losses are implemented with the k2
framework and are available in the NeMo toolkit.
Related papers
- Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
Recognition [8.302549684364195]
We propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment.
CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.
arXiv Detail & Related papers (2023-07-26T11:59:14Z) - ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
for Streaming Speech Recognition [14.384132377946154]
We introduce a new streaming ASR model, ConvRNN-T, with a novel convolutional context consisting of local and global context encoders.
We show ConvRNN-T outperforms RNN-T, Conformer, and ContextNet onspeech and in-house data.
ConvRNN-T's superior accuracy along with its low footprint make it a promising candidate for on-device streaming ASR technologies.
arXiv Detail & Related papers (2022-09-29T15:33:41Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Alignment Restricted Streaming Recurrent Neural Network Transducer [29.218353627837214]
We propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T models.
The Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER)
The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency.
arXiv Detail & Related papers (2020-11-05T19:38:54Z) - Neuroevolutionary Transfer Learning of Deep Recurrent Neural Networks
through Network-Aware Adaptation [57.46377517266827]
This work introduces network-aware adaptive structure transfer learning (N-ASTL)
N-ASTL utilizes statistical information related to the source network's topology and weight distribution to inform how new input and output neurons are to be integrated into the existing structure.
Results show improvements over prior state-of-the-art, including the ability to transfer in challenging real-world datasets not previously possible.
arXiv Detail & Related papers (2020-06-04T06:07:30Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.