CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
- URL: http://arxiv.org/abs/2204.06028v1
- Date: Tue, 12 Apr 2022 18:30:20 GMT
- Title: CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
- Authors: Peter Pol\'ak, Ngoc-Quan Ngoc, Tuan-Nam Nguyen, Danni Liu, Carlos
Mullov, Jan Niehues, Ond\v{r}ej Bojar, Alexander Waibel
- Abstract summary: We apply strategies to utilize an offline model in a simultaneous setting without the need to modify the original model.
Our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set.
- Score: 59.39104119817371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe our submission to the Simultaneous Speech
Translation at IWSLT 2022. We explore strategies to utilize an offline model in
a simultaneous setting without the need to modify the original model. In our
experiments, we show that our onlinization algorithm is almost on par with the
offline setting while being 3x faster than offline in terms of latency on the
test set. We make our system publicly available.
Related papers
- Simultaneous Translation with Offline Speech and LLM Models in CUNI Submission to IWSLT 2025 [0.0]
This paper describes Charles University submission to the Simultaneous Speech Translation Task of the IWSLT 2025.<n>We cover all four language pairs with a direct or cascade approach.<n>The backbone of our systems is the offline Whisper speech model, which we use for both translation and transcription in simultaneous mode with the state-of-the-art simultaneous policy AlignAtt.
arXiv Detail & Related papers (2025-06-20T15:27:44Z) - CMU's IWSLT 2024 Simultaneous Speech Translation System [80.15755988907506]
This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner.
Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder.
arXiv Detail & Related papers (2024-08-14T10:44:51Z) - Direct Models for Simultaneous Translation and Automatic Subtitling:
FBK@IWSLT2023 [26.001878009713597]
This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign.
Our submission focused on the use of direct architectures to perform both tasks.
Our English-German SimulST system shows a reduced computational-aware latency compared to the one achieved by the top-ranked systems in the 2021 and 2022 rounds of the task.
arXiv Detail & Related papers (2023-09-27T10:24:42Z) - Incremental Blockwise Beam Search for Simultaneous Speech Translation
with Controllable Quality-Latency Tradeoff [49.75167556773752]
Blockwise self-attentional encoder models have emerged as one promising end-to-end approach to simultaneous speech translation.
We propose a modified incremental blockwise beam search incorporating local agreement or hold-$n$ policies for quality-latency control.
arXiv Detail & Related papers (2023-09-20T14:59:06Z) - Speech Translation with Foundation Models and Optimal Transport: UPC at
IWSLT23 [0.0]
This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task.
Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50)
We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model.
arXiv Detail & Related papers (2023-06-02T07:48:37Z) - Learning When to Speak: Latency and Quality Trade-offs for Simultaneous
Speech-to-Speech Translation with Offline Models [18.34485337755259]
We introduce a system for simultaneous S2ST targeting real-world use cases.
Our system supports translation from 57 languages to English with tunable parameters for dynamically adjusting the latency of the output.
We show that these policies achieve offline-level accuracy with minimal increases in latency over a Greedy (wait-$k$) baseline.
arXiv Detail & Related papers (2023-06-01T23:29:23Z) - Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data.
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - The Volctrans Neural Speech Translation System for IWSLT 2021 [26.058205594318405]
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team.
For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark.
For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model.
arXiv Detail & Related papers (2021-05-16T00:11:59Z) - SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario.
SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.