Related papers: SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

URL: http://arxiv.org/abs/2406.14177v1
Date: Thu, 20 Jun 2024 10:34:46 GMT
Title: SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation
Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli,
Abstract summary: This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. The SeamlessM4T model is used "off-the-shelf" and its simultaneous inference is enabled through the adoption of AlignAtt. Simul Seamless covers more than 143 source languages and 200 target languages.
Score: 23.75894159181602
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. For this year's submission in the speech-to-text translation (ST) sub-track, we propose SimulSeamless, which is realized by combining AlignAtt and SeamlessM4T in its medium configuration. The SeamlessM4T model is used "off-the-shelf" and its simultaneous inference is enabled through the adoption of AlignAtt, a SimulST policy based on cross-attention that can be applied without any retraining or adaptation of the underlying model for the simultaneous task. We participated in all the Shared Task languages (English->{German, Japanese, Chinese}, and Czech->English), achieving acceptable or even better results compared to last year's submissions. SimulSeamless, covering more than 143 source languages and 200 target languages, is released at: https://github.com/hlt-mt/FBK-fairseq/.

Related papers

CMU's IWSLT 2024 Simultaneous Speech Translation System [80.15755988907506]
This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner. Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder.
arXiv Detail & Related papers (2024-08-14T10:44:51Z)
NAIST Simultaneous Speech Translation System for IWSLT 2024 [18.77311658086372]
This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt. Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module.
arXiv Detail & Related papers (2024-06-30T20:41:02Z)
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages. We developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z)
KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023. The task requires translation into 10 languages of varying amounts of resources. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z)
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z)
A Twitter BERT Approach for Offensive Language Detection in Marathi [0.7874708385247353]
This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set.
arXiv Detail & Related papers (2022-12-20T07:22:45Z)
TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task [16.986003476984965]
This paper describes the joint submission of Alibaba and Soochow University, TSMind, to the WMT 2022 Shared Task on Translation Suggestion. Basically, we utilize the model paradigm fine-tuning on the downstream tasks based on large-scale pre-trained models. Considering the task's condition of limited use of training data, we follow the data augmentation strategies proposed by WeTS to boost our TS model performance.
arXiv Detail & Related papers (2022-11-16T15:43:31Z)
Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task. We participate in the general translation task on English$Leftrightarrow$Livonian. Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z)
Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available. Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality. We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.