Related papers: NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

URL: http://arxiv.org/abs/2306.07763v1
Date: Tue, 13 Jun 2023 13:22:30 GMT
Title: NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track
Authors: Edward Gow-Smith, Alexandre Berard, Marcely Zanon Boito, Ioan Calapodescu
Abstract summary: This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track. Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions.
Score: 78.80683163990446
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track. Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions that leverage strong pre-trained models. Our primary submission for Tamasheq outperforms the previous state of the art by 7.5 BLEU points on the IWSLT 2022 test set, and achieves 23.6 BLEU on this year's test set, outperforming the second best participant by 7.7 points. For Quechua, we also rank first and achieve 17.7 BLEU, despite having only two hours of translation data. Finally, we show that our proposed multilingual architecture is also competitive for high-resource languages, outperforming the best unconstrained submission to the IWSLT 2021 Multilingual track, despite using much less training data and compute.

Related papers

NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models [2.186901738997927]
This paper introduces NusaMT-7B, an LLM-based machine translation model for low-resource Indonesian languages. Our approach integrates continued pre-training on monolingual data,Supervised Fine-Tuning (SFT), self-learning, and an LLM-based data cleaner to reduce noise in parallel sentences. Our results show that fine-tuned LLMs can enhance translation quality for low-resource languages, aiding in linguistic preservation and cross-cultural communication.
arXiv Detail & Related papers (2024-10-10T11:33:25Z)
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale [25.257770733168012]
Large language models (LLMs) have achieved remarkable success across various NLP tasks with a focus on English due to English-centric pre-training and limited multilingual data. We introduce **X-ALMA**, a model designed to ensure top-tier performance across 50 diverse languages.
arXiv Detail & Related papers (2024-10-04T03:17:27Z)
KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023. The task requires translation into 10 languages of varying amounts of resources. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z)
Strategies for improving low resource speech to text translation relying on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST) We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z)
Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task. We participate in the general translation task on English$Leftrightarrow$Livonian. Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z)
The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task. The YiTrans system is built on large-scale pre-trained encoder-decoder models. Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z)
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task [38.899667657333595]
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system.
arXiv Detail & Related papers (2021-06-30T13:29:19Z)
The Volctrans Neural Speech Translation System for IWSLT 2021 [26.058205594318405]
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model.
arXiv Detail & Related papers (2021-05-16T00:11:59Z)
Bilingual Dictionary-based Language Model Pretraining for Neural Machine Translation [0.0]
We incorporate the translation information from dictionaries into the pretraining process and propose a novel Bilingual Dictionary-based Language Model (BDLM) We evaluate our BDLM in Chinese, English, and Romanian.
arXiv Detail & Related papers (2021-03-12T02:01:22Z)
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models [82.22294901727933]
A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability. Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
arXiv Detail & Related papers (2020-10-24T08:15:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.