Related papers: Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

URL: http://arxiv.org/abs/2309.15554v1
Date: Wed, 27 Sep 2023 10:24:42 GMT
Title: Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023
Authors: Sara Papi, Marco Gaido, Matteo Negri
Abstract summary: This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. Our submission focused on the use of direct architectures to perform both tasks. Our English-German SimulST system shows a reduced computational-aware latency compared to the one achieved by the top-ranked systems in the 2021 and 2022 rounds of the task.
Score: 26.001878009713597
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. Our submission focused on the use of direct architectures to perform both tasks: for the simultaneous one, we leveraged the knowledge already acquired by offline-trained models and directly applied a policy to obtain the real-time inference; for the subtitling one, we adapted the direct ST model to produce well-formed subtitles and exploited the same architecture to produce timestamps needed for the subtitle synchronization with audiovisual content. Our English-German SimulST system shows a reduced computational-aware latency compared to the one achieved by the top-ranked systems in the 2021 and 2022 rounds of the task, with gains of up to 3.5 BLEU. Our automatic subtitling system outperforms the only existing solution based on a direct system by 3.7 and 1.7 SubER in English-German and English-Spanish respectively.

Related papers

MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task [7.247809853198223]
This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2025 Simultaneous Speech Translation track.<n>Our submission addresses the unique challenges of real-time translation of long-form speech by developing a modular cascade system.
arXiv Detail & Related papers (2025-06-23T16:44:01Z)
Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025 [0.0]
This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025.<n>We cover all four language pairs with a direct or cascade approach.<n>The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt.
arXiv Detail & Related papers (2025-06-20T15:27:44Z)
CMU's IWSLT 2024 Simultaneous Speech Translation System [80.15755988907506]
This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner. Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder.
arXiv Detail & Related papers (2024-08-14T10:44:51Z)
NAIST Simultaneous Speech Translation System for IWSLT 2024 [18.77311658086372]
This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt. Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module.
arXiv Detail & Related papers (2024-06-30T20:41:02Z)
KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023. The task requires translation into 10 languages of varying amounts of resources. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z)
BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z)
The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task. The YiTrans system is built on large-scale pre-trained encoder-decoder models. Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z)
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks [8.651248939672769]
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. We build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models.
arXiv Detail & Related papers (2022-05-04T10:36:57Z)
CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022 [59.39104119817371]
We apply strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. Our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set.
arXiv Detail & Related papers (2022-04-12T18:30:20Z)
The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We trained our models with the officially provided ASR and MT datasets. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)
UPC's Speech Translation System for IWSLT 2021 [2.099922236065961]
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Our submission is an end-to-end speech translation system, which combines pre-trained models with coupling modules between the encoder and decoder.
arXiv Detail & Related papers (2021-05-10T17:04:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.