Related papers: Multilingual Simultaneous Speech Translation

Multilingual Simultaneous Speech Translation

URL: http://arxiv.org/abs/2203.14835v2
Date: Tue, 29 Mar 2022 07:55:11 GMT
Title: Multilingual Simultaneous Speech Translation
Authors: Shashank Subramanya, Jan Niehues
Abstract summary: A common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. We investigate multilingual models and different architectures on the ability to perform online speech translation.
Score: 12.376309678270275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Applications designed for simultaneous speech translation during events such as conferences or meetings need to balance quality and lag while displaying translated text to deliver a good user experience. One common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. Based on a technique to adapt end-to-end monolingual models, we investigate multilingual models and different architectures (end-to-end and cascade) on the ability to perform online speech translation. On the multilingual TEDx corpus, we show that the approach generalizes to different architectures. We see similar gains in latency reduction (40% relative) across languages and architectures. However, the end-to-end architecture leads to smaller translation quality losses after adapting to the online model. Furthermore, the approach even scales to zero-shot directions.

Related papers

A Context-aware Framework for Translation-mediated Conversations [29.169155271343083]
We present a framework to improve large language model-based translation systems by incorporating contextual information in bilingual conversational settings. We validate both components of our framework on two task-oriented domains: customer chat and user-assistant interaction. Our framework consistently results in better translations than state-of-the-art systems like GPT-4o and TowerInstruct.
arXiv Detail & Related papers (2024-12-05T14:41:05Z)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion. We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z)
Do Multilingual Language Models Think Better in English? [24.713751471567395]
Translate-test is a popular technique to improve the performance of multilingual language models. In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system.
arXiv Detail & Related papers (2023-08-02T15:29:22Z)
Improving Language Model Integration for Neural Machine Translation [43.85486035238116]
We show that accounting for the implicit language model significantly boosts the performance of language model fusion. We find that accounting for the implicit language model significantly boosts the performance of language model fusion.
arXiv Detail & Related papers (2023-06-08T10:00:19Z)
Language Model Tokenizers Introduce Unfairness Between Languages [98.92630681729518]
We show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. Character-level and byte-level models also exhibit over 4 times the difference in the encoding length for some language pairs. We make the case that we should train future language models using multilingually fair subword tokenizers.
arXiv Detail & Related papers (2023-05-17T14:17:57Z)
Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations [75.73028056136778]
We show how to practically build MNMT systems that serve arbitrary X-Y translation directions. We also examine our proposed approach in an extremely large-scale data setting to accommodate practical deployment scenarios.
arXiv Detail & Related papers (2022-06-30T02:18:15Z)
Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM. To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure. We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z)
Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference. We propose an adaptive and sparse architecture for multilingual modeling. Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z)
Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way. Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously. We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.