The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at
IWSLT 2021
- URL: http://arxiv.org/abs/2107.00279v1
- Date: Thu, 1 Jul 2021 08:09:00 GMT
- Title: The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at
IWSLT 2021
- Authors: Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai
- Abstract summary: This paper describes USTC-NELSLIP's submissions to the IWSLT 2021 Simultaneous Speech Translation task.
We proposed a novel simultaneous translation model, Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T to sequence-to-sequence tasks.
Experiments on speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks shows CAAT achieves better quality-latency trade-offs.
- Score: 36.95800637790494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous
Speech Translation task. We proposed a novel simultaneous translation model,
Cross Attention Augmented Transducer (CAAT), which extends conventional RNN-T
to sequence-to-sequence tasks without monotonic constraints, e.g., simultaneous
translation. Experiments on speech-to-text (S2T) and text-to-text (T2T)
simultaneous translation tasks shows CAAT achieves better quality-latency
trade-offs compared to \textit{wait-k}, one of the previous state-of-the-art
approaches. Based on CAAT architecture and data augmentation, we build S2T and
T2T simultaneous translation systems in this evaluation campaign. Compared to
last year's optimal systems, our S2T simultaneous translation system improves
by an average of 11.3 BLEU for all latency regimes, and our T2T simultaneous
translation system improves by an average of 4.6 BLEU.
Related papers
- CMU's IWSLT 2024 Simultaneous Speech Translation System [80.15755988907506]
This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner.
Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder.
arXiv Detail & Related papers (2024-08-14T10:44:51Z) - Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language.
In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems.
We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - Incremental Speech Synthesis For Speech-To-Speech Translation [23.951060578077445]
We focus on improving the incremental synthesis performance of TTS models.
With a simple data augmentation strategy based on prefixes, we are able to improve the incremental TTS quality to approach offline performance.
We propose latency metrics tailored to S2ST applications, and investigate methods for latency reduction in this context.
arXiv Detail & Related papers (2021-10-15T17:20:28Z) - The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21 [25.41660831320743]
We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer.
Our system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.
arXiv Detail & Related papers (2021-09-23T09:41:44Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task [38.899667657333595]
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team.
We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system.
arXiv Detail & Related papers (2021-06-30T13:29:19Z) - The Volctrans Neural Speech Translation System for IWSLT 2021 [26.058205594318405]
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team.
For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark.
For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model.
arXiv Detail & Related papers (2021-05-16T00:11:59Z) - Unsupervised Bitext Mining and Translation via Self-trained Contextual
Embeddings [51.47607125262885]
We describe an unsupervised method to create pseudo-parallel corpora for machine translation (MT) from unaligned text.
We use multilingual BERT to create source and target sentence embeddings for nearest-neighbor search and adapt the model via self-training.
We validate our technique by extracting parallel sentence pairs on the BUCC 2017 bitext mining task and observe up to a 24.5 point increase (absolute) in F1 scores over previous unsupervised methods.
arXiv Detail & Related papers (2020-10-15T14:04:03Z) - ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation
Challenge Tasks at IWSLT 2020 [25.024259342365934]
ON-TRAC Consortium is composed of researchers from three French academic laboratories.
Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track.
In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask.
arXiv Detail & Related papers (2020-05-24T23:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.