Related papers: On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

URL: http://arxiv.org/abs/2006.05129v1
Date: Tue, 9 Jun 2020 09:01:04 GMT
Title: On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Authors: Bal\'azs Tarj\'an, Gy\"orgy Szasz\'ak, Tibor Fegy\'o, P\'eter Mihajlik
Abstract summary: We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation. We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advanced neural network models have penetrated Automatic Speech Recognition (ASR) in recent years, however, in language modeling many systems still rely on traditional Back-off N-gram Language Models (BNLM) partly or entirely. The reason for this are the high cost and complexity of training and using neural language models, mostly possible by adding a second decoding pass (rescoring). In our recent work we have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with text generation based data augmentation. In the present paper we analyze the amount of transferable knowledge and demonstrate that the neural augmented LM (RNN-BNLM) can help to capture almost 50% of the knowledge of the RNNLM yet by dropping the second decoding pass and making the system real-time capable. We also systematically compare word and subword LMs and show that subword-based neural text augmentation can be especially beneficial in under-resourced conditions. In addition, we show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.

Related papers

Mitigating Data Scarcity for Large Language Models [7.259279261659759]
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm. Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research. In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques.
arXiv Detail & Related papers (2023-02-03T15:17:53Z)
Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method. They have shown promising results on speech command recognition tasks. In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z)
Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation [46.825957917649795]
Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. We propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph. We are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models.
arXiv Detail & Related papers (2022-11-01T15:26:22Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference. While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy. Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z)
Dynamic Gesture Recognition [0.0]
It is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms. The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-09-20T09:45:29Z)
Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z)
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer [28.697119605752643]
Recurrent Neural Network Transducer (RNN-T) has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training. Previous work has proposed various fusion methods to incorporate external NNLMs into end-to-end ASR to address this weakness. We propose extensions to these techniques that allow RNN-T to exploit external NNLMs during both training and inference time.
arXiv Detail & Related papers (2020-10-26T20:10:12Z)
Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR [0.0]
Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR. Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation. We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language. We propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords.
arXiv Detail & Related papers (2020-07-14T10:22:05Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.