Speech Translation with Foundation Models and Optimal Transport: UPC at
IWSLT23
- URL: http://arxiv.org/abs/2306.01327v1
- Date: Fri, 2 Jun 2023 07:48:37 GMT
- Title: Speech Translation with Foundation Models and Optimal Transport: UPC at
IWSLT23
- Authors: Ioannis Tsiamas, Gerard I. G\'allego, Jos\'e A. R. Fonollosa, Marta R.
Costa-juss\`a
- Abstract summary: This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task.
Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50)
We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes the submission of the UPC Machine Translation group to
the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems
utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We
incorporate a Siamese pretraining step of the speech and text encoders with CTC
and Optimal Transport, to adapt the speech representations to the space of the
text model, thus maximizing transfer learning from MT. After this pretraining,
we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge
Distillation. Apart from the available ST corpora, we create synthetic data
with SegAugment to better adapt our models to the custom segmentations of the
IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C
tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released
IWSLT.ACLdev2023.
Related papers
- CMU's IWSLT 2024 Simultaneous Speech Translation System [80.15755988907506]
This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner.
Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder.
arXiv Detail & Related papers (2024-08-14T10:44:51Z) - NAIST Simultaneous Speech Translation System for IWSLT 2024 [18.77311658086372]
This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign.
We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART.
We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt.
Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module.
arXiv Detail & Related papers (2024-06-30T20:41:02Z) - Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? [49.42189569058647]
Two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS)
In this paper, we introduce a composite S2ST model named ComSpeech, which can seamlessly integrate any pretrained S2TT and TTS models into a direct S2ST model.
We also propose a novel training method ComSpeech-ZS that solely utilizes S2TT and TTS data.
arXiv Detail & Related papers (2024-06-11T14:17:12Z) - KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z) - Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data.
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - The HW-TSC's Offline Speech Translation Systems for IWSLT 2021
Evaluation [22.617563646374602]
This paper describes our work in participation of the IWSLT-2021 offline speech translation task.
Our system was built in a cascade form, including a speaker diarization module, an Automatic Speech Recognition (ASR) module and a Machine Translation (MT) module.
Our method achieves 24.6 BLEU score on the 2021 test set.
arXiv Detail & Related papers (2021-08-09T07:28:04Z) - The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline
Task [23.008938777422767]
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task.
We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding.
We achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.
arXiv Detail & Related papers (2021-07-06T07:45:23Z) - UPC's Speech Translation System for IWSLT 2021 [2.099922236065961]
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group.
The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text.
Our submission is an end-to-end speech translation system, which combines pre-trained models with coupling modules between the encoder and decoder.
arXiv Detail & Related papers (2021-05-10T17:04:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.