KIT's Multilingual Speech Translation System for IWSLT 2023
- URL: http://arxiv.org/abs/2306.05320v3
- Date: Wed, 12 Jul 2023 04:41:47 GMT
- Title: KIT's Multilingual Speech Translation System for IWSLT 2023
- Authors: Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan
Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan
Niehues
- Abstract summary: We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
- Score: 58.5152569458259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing speech translation benchmarks focus on native-English speech in
high-quality recording conditions, which often do not match the conditions in
real-life use-cases. In this paper, we describe our speech translation system
for the multilingual track of IWSLT 2023, which evaluates translation quality
on scientific conference talks. The test condition features accented input
speech and terminology-dense contents. The task requires translation into 10
languages of varying amounts of resources. In absence of training data from the
target domain, we use a retrieval-based approach (kNN-MT) for effective
adaptation (+0.8 BLEU for speech translation). We also use adapters to easily
integrate incremental training data from data augmentation, and show that it
matches the performance of re-training. We observe that cascaded systems are
more easily adaptable towards specific target domains, due to their separate
modules. Our cascaded speech system substantially outperforms its end-to-end
counterpart on scientific talk translation, although their performance remains
similar on TED talks.
Related papers
- NAIST Simultaneous Speech Translation System for IWSLT 2024 [18.77311658086372]
This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign.
We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART.
We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt.
Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module.
arXiv Detail & Related papers (2024-06-30T20:41:02Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - Simple and Effective Unsupervised Speech Translation [68.25022245914363]
We study a simple and effective approach to build speech translation systems without labeled data.
We present an unsupervised domain adaptation technique for pre-trained speech models.
Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
arXiv Detail & Related papers (2022-10-18T22:26:13Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - FST: the FAIR Speech Translation System for the IWSLT21 Multilingual
Shared Task [36.51221186190272]
We describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign.
Our system is built by leveraging transfer learning across modalities, tasks and languages.
arXiv Detail & Related papers (2021-07-14T19:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.