Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation
- URL: http://arxiv.org/abs/2006.05474v2
- Date: Fri, 9 Oct 2020 04:07:38 GMT
- Title: Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation
- Authors: Changhan Wang, Juan Pino, Jiatao Gu
- Abstract summary: We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language.
We show that training ST with human translations is not necessary.
Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
- Score: 63.16500026845157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning from high-resource languages is known to be an efficient
way to improve end-to-end automatic speech recognition (ASR) for low-resource
languages. Pre-trained or jointly trained encoder-decoder models, however, do
not share the language modeling (decoder) for the same language, which is
likely to be inefficient for distant target languages. We introduce
speech-to-text translation (ST) as an auxiliary task to incorporate additional
knowledge of the target language and enable transferring from that target
language. Specifically, we first translate high-resource ASR transcripts into a
target low-resource language, with which a ST model is trained. Both ST and
target ASR share the same attention-based encoder-decoder architecture and
vocabulary. The former task then provides a fully pre-trained model for the
latter, bringing up to 24.6% word error rate (WER) reduction to the baseline
(direct transfer from high-resource ASR). We show that training ST with human
translations is not necessary. ST trained with machine translation (MT)
pseudo-labels brings consistent gains. It can even outperform those using human
labels when transferred to target ASR by leveraging only 500K MT examples. Even
with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer
brings up to 8.9% WER reduction to direct transfer.
Related papers
- Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Learning Cross-lingual Mappings for Data Augmentation to Improve
Low-Resource Speech Recognition [31.575930914290762]
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages.
We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition.
The results show that any source language ASR model can be used for a low-resource target language recognition.
arXiv Detail & Related papers (2023-06-14T15:24:31Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for
Low-Resource Speech Recognition with Transducers [6.017182111335404]
Cross-lingual knowledge transfer and iterative pseudo-labeling are two techniques that have been shown to be successful for improving the accuracy of ASR systems.
We show that the Transducer system trained using transcripts produced by the hybrid system achieves 18% reduction in terms of word error rate.
arXiv Detail & Related papers (2023-05-23T03:50:35Z) - Back Translation for Speech-to-text Translation Without Transcripts [11.13240570688547]
We develop a back translation algorithm for ST (BT4ST) to synthesize pseudo ST data from monolingual target data.
To ease the challenges posed by short-to-long generation and one-to-many mapping, we introduce self-supervised discrete units.
With our synthetic ST data, we achieve an average boost of 2.3 BLEU on MuST-C En-De, En-Fr, and En-Es datasets.
arXiv Detail & Related papers (2023-05-15T15:12:40Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Semi-supervised transfer learning for language expansion of end-to-end
speech recognition models to low-resource languages [19.44975351652865]
We propose a three-stage training methodology to improve the speech recognition accuracy of low-resource languages.
We leverage a well-trained English model, unlabeled text corpus, and unlabeled audio corpus using transfer learning, TTS augmentation, and SSL respectively.
Overall, our two-pass speech recognition system with a Monotonic Chunkwise Attention (MoA) in the first pass achieves a WER reduction of 42% relative to the baseline.
arXiv Detail & Related papers (2021-11-19T05:09:16Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.