Strategies for improving low resource speech to text translation relying
on pre-trained ASR models
- URL: http://arxiv.org/abs/2306.00208v1
- Date: Wed, 31 May 2023 21:58:07 GMT
- Title: Strategies for improving low resource speech to text translation relying
on pre-trained ASR models
- Authors: Santosh Kesiraju, Marek Sarvas, Tomas Pavlicek, Cecile Macaire,
Alejandro Ciuba
- Abstract summary: This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
- Score: 59.90106959717875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents techniques and findings for improving the performance of
low-resource speech to text translation (ST). We conducted experiments on both
simulated and real-low resource setups, on language pairs English - Portuguese,
and Tamasheq - French respectively. Using the encoder-decoder framework for ST,
our results show that a multilingual automatic speech recognition system acts
as a good initialization under low-resource scenarios. Furthermore, using the
CTC as an additional objective for translation during training and decoding
helps to reorder the internal representations and improves the final
translation. Through our experiments, we try to identify various factors
(initializations, objectives, and hyper-parameters) that contribute the most
for improvements in low-resource setups. With only 300 hours of pre-training
data, our model achieved 7.3 BLEU score on Tamasheq - French data,
outperforming prior published works from IWSLT 2022 by 1.6 points.
Related papers
- Chain-of-Translation Prompting (CoTR): A Novel Prompting Technique for Low Resource Languages [0.4499833362998489]
Chain of Translation Prompting (CoTR) is a novel strategy designed to enhance the performance of language models in low-resource languages.
CoTR restructures prompts to first translate the input context from a low-resource language into a higher-resource language, such as English.
We demonstrate the effectiveness of this method through a case study on the low-resource Indic language Marathi.
arXiv Detail & Related papers (2024-09-06T17:15:17Z) - Embedded Translations for Low-resource Automated Glossing [11.964276799347642]
We augment a hard-attentional neural model with embedded translation information extracted from interlinear glossed text.
We introduce a character-level decoder for generating glossed output.
Our results highlight the critical role of translation information in boosting the system's performance.
arXiv Detail & Related papers (2024-03-13T02:23:13Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - Exploiting News Article Structure for Automatic Corpus Generation of
Entailment Datasets [1.859931123372708]
We propose a methodology for automatically producing benchmark datasets for low-resource languages using published news articles.
Second, we produce new pretrained transformers based on the ELECTRA technique to further alleviate the resource scarcity in Filipino.
Third, we perform analyses on transfer learning techniques to shed light on their true performance when operating in low-data domains.
arXiv Detail & Related papers (2020-10-22T10:09:10Z) - Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language.
We show that training ST with human translations is not necessary.
Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.