Related papers: Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

URL: http://arxiv.org/abs/2005.10407v2
Date: Thu, 28 May 2020 09:35:42 GMT
Title: Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning
Authors: Zhiping Zeng, Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni and Bin Ma
Abstract summary: We propose a hybrid Transformer-LSTM based architecture to improve low-resource end-to-end ASR. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER.
Score: 37.55706646713447
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.

Related papers

Probing the limit of hydrologic predictability with the Transformer network [7.326504492614808]
We show that a vanilla Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS dataset. A recurrence-free variant of Transformer can obtain mixed comparisons with LSTM, producing the same Kling-Gupta efficiency coefficient (KGE) along with other metrics. While the Transformer results are not higher than current state-of-the-art, we still learned some valuable lessons.
arXiv Detail & Related papers (2023-06-21T17:06:54Z)
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners) They incorporate a wide range of relative positional encoding mechanisms (RPEs) These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z)
Performance Comparison of Simple Transformer and Res-CNN-BiLSTM for Cyberbullying Classification [4.2317391919680425]
We present a performance based comparison between simple transformer based network and Res-CNN-BiLSTM based network for cyberbullying text classification problem. The results obtained show that transformer we trained with 0.65 million parameters has significantly being able to beat the performance of Res-CNN-BiLSTM with 48.82 million parameters for faster training speeds and more generalized metrics.
arXiv Detail & Related papers (2022-06-05T15:46:21Z)
Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. In practice, it is often observed that Transformer models have better representation power than LSTM. We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z)
Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs? [0.0]
We use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN) We attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPGAN (SADPGAN) was evaluated for performance, quality and diversity of generated text and stability.
arXiv Detail & Related papers (2021-08-26T14:15:36Z)
Improving Deep Learning for HAR with shallow LSTMs [70.94062293989832]
We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM. Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
arXiv Detail & Related papers (2021-08-02T08:14:59Z)
The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We trained our models with the officially provided ASR and MT datasets. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)
Transformer based Multilingual document Embedding model [22.346360611417648]
This paper presents a transformer-based sentence/document embedding model, T-LASER, which makes three significant improvements. Firstly, the BiLSTM layers is replaced by the attention-based transformer layers, which is more capable of learning sequential patterns in longer texts. Secondly, due to the absence of recurrence, T-LASER enables faster parallel computations in the encoder to generate the text embedding.
arXiv Detail & Related papers (2020-08-19T17:51:30Z)
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding [18.526060699574142]
Bidirectional Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments.
arXiv Detail & Related papers (2020-03-16T03:38:51Z)
Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network. We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.