Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025
- URL: http://arxiv.org/abs/2506.17077v1
- Date: Fri, 20 Jun 2025 15:27:44 GMT
- Title: Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025
- Authors: Dominik Macháček, Peter Polák,
- Abstract summary: This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025.<n>We cover all four language pairs with a direct or cascade approach.<n>The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025. We cover all four language pairs with a direct or cascade approach. The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt. We further improve the performance by prompting to inject in-domain terminology, and we accommodate context. Our cascaded systems further use EuroLLM for unbounded simultaneous translation. Compared to the Organizers' baseline, our systems improve by 2 BLEU points on Czech to English and 13-22 BLEU points on English to German, Chinese and Japanese on the development sets. Additionally, we also propose a new enhanced measure of speech recognition latency.
Related papers
- KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 [56.61209412965054]
We present the Karlsruhe Institute of Technology's submissions for the Offline ST and Instruction Following (IF) tracks.<n>We propose a pipeline that employs multiple automatic speech recognition systems, whose outputs are fused using an LLM with document-level context.<n>For the IF track, we develop an end-to-end model that integrates a speech encoder with an LLM to perform a wide range of instruction-following tasks.
arXiv Detail & Related papers (2025-05-19T12:21:29Z) - CMU's IWSLT 2024 Simultaneous Speech Translation System [80.15755988907506]
This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner.
Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder.
arXiv Detail & Related papers (2024-08-14T10:44:51Z) - NAIST Simultaneous Speech Translation System for IWSLT 2024 [18.77311658086372]
This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign.
We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART.
We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt.
Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module.
arXiv Detail & Related papers (2024-06-30T20:41:02Z) - SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022 [59.39104119817371]
We apply strategies to utilize an offline model in a simultaneous setting without the need to modify the original model.
Our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set.
arXiv Detail & Related papers (2022-04-12T18:30:20Z) - Decision Attentive Regularization to Improve Simultaneous Speech
Translation Systems [12.152208198444182]
Simultaneous Speech-to-text Translation (SimulST) systems translate source speech in tandem with the speaker using partial input.
Recent works have tried to leverage the text translation task to improve the performance of Speech Translation (ST) in the offline domain.
Motivated by these improvements, we propose to add Decision Attentive Regularization (DAR) to Monotonic Multihead Attention (MMA) based SimulST systems.
arXiv Detail & Related papers (2021-10-13T08:33:31Z) - CUNI systems for WMT21: Multilingual Low-Resource Translation for
Indo-European Languages Shared Task [0.0]
We show that using joint model for multiple similar language pairs improves upon translation quality in each pair.
We also demonstrate that chararacter-level bilingual models are competitive for very similar language pairs.
arXiv Detail & Related papers (2021-09-20T08:10:39Z) - The Volctrans Neural Speech Translation System for IWSLT 2021 [26.058205594318405]
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team.
For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark.
For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model.
arXiv Detail & Related papers (2021-05-16T00:11:59Z) - ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation
Challenge Tasks at IWSLT 2020 [25.024259342365934]
ON-TRAC Consortium is composed of researchers from three French academic laboratories.
Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track.
In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask.
arXiv Detail & Related papers (2020-05-24T23:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.