Dynamic Masking for Improved Stability in Spoken Language Translation
- URL: http://arxiv.org/abs/2006.00249v2
- Date: Mon, 31 May 2021 22:04:56 GMT
- Title: Dynamic Masking for Improved Stability in Spoken Language Translation
- Authors: Yuekun Yao and Barry Haddow
- Abstract summary: We show how a mask can be set to improve the latency-flicker trade-off without sacrificing translation quality.
A possible solution is to add a fixed delay, or "mask" to the the output of the MT system.
We show how this mask can be set dynamically, improving the latency-flicker trade-off without sacrificing translation quality.
- Score: 8.591381243212712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For spoken language translation (SLT) in live scenarios such as conferences,
lectures and meetings, it is desirable to show the translation to the user as
quickly as possible, avoiding an annoying lag between speaker and translated
captions. In other words, we would like low-latency, online SLT. If we assume a
pipeline of automatic speech recognition (ASR) and machine translation (MT)
then a viable approach to online SLT is to pair an online ASR system, with a a
retranslation strategy, where the MT system re-translates every update received
from ASR. However this can result in annoying "flicker" as the MT system
updates its translation. A possible solution is to add a fixed delay, or "mask"
to the the output of the MT system, but a fixed global mask introduces
undesirable latency to the output. We show how this mask can be set
dynamically, improving the latency-flicker trade-off without sacrificing
translation quality.
Related papers
- Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution.
It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector.
Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z) - Simultaneous Translation for Unsegmented Input: A Sliding Window
Approach [8.651762907847848]
We present a sliding window approach to translate raw ASR outputs (online or offline) without needing to rely on an automatic segmenter.
Experiments on English-to-German and English-to-Czech show that our approach improves 1.3--2.0 BLEU points over the usual ASR-segmenter pipeline.
arXiv Detail & Related papers (2022-10-18T11:07:28Z) - Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation [71.35243644890537]
End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions.
Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space.
We propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text.
arXiv Detail & Related papers (2022-10-18T03:06:47Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - Multilingual Unsupervised Neural Machine Translation with Denoising
Adapters [77.80790405710819]
We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data.
For this problem the standard procedure so far to leverage the monolingual data is back-translation, which is computationally costly and hard to tune.
In this paper we propose instead to use denoising adapters, adapter layers with a denoising objective, on top of pre-trained mBART-50.
arXiv Detail & Related papers (2021-10-20T10:18:29Z) - MeetDot: Videoconferencing with Live Translation Captions [18.60812558978417]
We present MeetDot, a videoconferencing system with live translation captions overlaid on screen.
Our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade.
We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker.
arXiv Detail & Related papers (2021-09-20T14:34:14Z) - A Technical Report: BUT Speech Translation Systems [2.9327503320877457]
The paper describes the BUT's speech translation systems.
The systems are English$longrightarrow$German offline speech translation systems.
A large degradation is observed when translating ASR hypothesis compared to the oracle input text.
arXiv Detail & Related papers (2020-10-22T10:52:31Z) - Cascaded Models With Cyclic Feedback For Direct Speech Translation [14.839931533868176]
We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data.
A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English speech translation.
arXiv Detail & Related papers (2020-10-21T17:18:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.