Memory-augmented conformer for improved end-to-end long-form ASR
- URL: http://arxiv.org/abs/2309.13029v1
- Date: Fri, 22 Sep 2023 17:44:58 GMT
- Title: Memory-augmented conformer for improved end-to-end long-form ASR
- Authors: Carlos Carvalho and Alberto Abad
- Abstract summary: We propose a memory-augmented neural network between the encoder and decoder of a conformer.
This external memory can enrich the generalization for longer utterances.
We show that the proposed system outperforms the baseline conformer without memory for long utterances.
- Score: 9.876354589883002
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Conformers have recently been proposed as a promising modelling approach for
automatic speech recognition (ASR), outperforming recurrent neural
network-based approaches and transformers. Nevertheless, in general, the
performance of these end-to-end models, especially attention-based models, is
particularly degraded in the case of long utterances. To address this
limitation, we propose adding a fully-differentiable memory-augmented neural
network between the encoder and decoder of a conformer. This external memory
can enrich the generalization for longer utterances since it allows the system
to store and retrieve more information recurrently. Notably, we explore the
neural Turing machine (NTM) that results in our proposed Conformer-NTM model
architecture for ASR. Experimental results using Librispeech train-clean-100
and train-960 sets show that the proposed system outperforms the baseline
conformer without memory for long utterances.
Related papers
- Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate [17.611912733951662]
Recurrent Neural Networks (RNNs) are renowned for their adeptness in modeling temporal dependencies.
We propose a novel Delayed Memory Unit (DMU) in this paper to enhance the temporal modeling capabilities of vanilla RNNs.
Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks.
arXiv Detail & Related papers (2023-10-23T14:29:48Z) - MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality.
Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z) - Return of the RNN: Residual Recurrent Networks for Invertible Sentence
Embeddings [0.0]
This study presents a novel model for invertible sentence embeddings using a residual recurrent network trained on an unsupervised encoding task.
Rather than the probabilistic outputs common to neural machine translation models, our approach employs a regression-based output layer to reconstruct the input sequence's word vectors.
The model achieves high accuracy and fast training with the ADAM, a significant finding given that RNNs typically require memory units, such as LSTMs, or second-order optimization methods.
arXiv Detail & Related papers (2023-03-23T15:59:06Z) - A novel Deep Neural Network architecture for non-linear system
identification [78.69776924618505]
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification.
Inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function)
This architecture allows for automatic complexity selection based solely on available data.
arXiv Detail & Related papers (2021-06-06T10:06:07Z) - Short-Term Memory Optimization in Recurrent Neural Networks by
Autoencoder-based Initialization [79.42778415729475]
We explore an alternative solution based on explicit memorization using linear autoencoders for sequences.
We show how such pretraining can better support solving hard classification tasks with long sequences.
We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
arXiv Detail & Related papers (2020-11-05T14:57:16Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and
(gradient) stable architecture for learning long time dependencies [15.2292571922932]
We propose a novel architecture for recurrent neural networks.
Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations.
Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks.
arXiv Detail & Related papers (2020-10-02T12:35:04Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Memory Augmented Neural Model for Incremental Session-based
Recommendation [36.33193124174747]
We show that existing neural recommenders can be used in incremental Session-based Recommendation scenarios.
We propose a general framework called Memory Augmented Neural model (MAN)
MAN augments a base neural recommender with a continuously queried and updated nonparametric memory.
arXiv Detail & Related papers (2020-04-28T19:07:20Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.