Related papers: Presenting Simultaneous Translation in Limited Space

Presenting Simultaneous Translation in Limited Space

URL: http://arxiv.org/abs/2009.09016v1
Date: Fri, 18 Sep 2020 18:37:03 GMT
Title: Presenting Simultaneous Translation in Limited Space
Authors: Dominik Mach\'a\v{c}ek, Ond\v{r}ej Bojar
Abstract summary: Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency. Subtitling must be shown promptly, incrementally, and with adequate time for reading. We propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency. Deploying these systems for users faces the problem of presenting subtitles in a limited space, such as two lines on a television screen. The subtitles must be shown promptly, incrementally, and with adequate time for reading. We provide an algorithm for subtitling. Furthermore, we propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and propose an improved measure for translation latency.

Related papers

High-Fidelity Simultaneous Speech-To-Speech Translation [75.69884829562591]
We introduce Hibiki, a decoder-only model for simultaneous speech translation. Hibiki leverages a multistream language model to synchronously process source and target speech, and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation.
arXiv Detail & Related papers (2025-02-05T17:18:55Z)
Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation [16.954965417930254]
We propose a novel latency evaluation metric for simultaneous translation called emphAverage Token Delay (ATD) We demonstrate its effectiveness through analyses simulating user-side latency based on Ear-Voice Span (EVS)
arXiv Detail & Related papers (2023-11-24T08:53:52Z)
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff [49.75167556773752]
Blockwise self-attentional encoder models have emerged as one promising end-to-end approach to simultaneous speech translation. We propose a modified incremental blockwise beam search incorporating local agreement or hold-$n$ policies for quality-latency control.
arXiv Detail & Related papers (2023-09-20T14:59:06Z)
DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution. It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector. Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z)
Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters [71.02335065794384]
We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We show that our model improves translation quality and isochrony compared to previous work.
arXiv Detail & Related papers (2023-05-22T16:36:04Z)
Average Token Delay: A Latency Metric for Simultaneous Translation [21.142539715996673]
We propose a novel latency evaluation metric called Average Token Delay (ATD) We discuss the advantage of ATD using simulated examples and also investigate the differences between ATD and Average Lagging with simultaneous translation experiments.
arXiv Detail & Related papers (2022-11-22T06:45:13Z)
Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT. Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z)
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers [54.705393237822044]
This paper proposes a novel approach to optimize each caption's output timing based on a trade-off between latency and caption quality. An audio-visual Trans-former is trained to generate ground-truth captions using only a small portion of all video frames. A CNN-based timing detector is also trained to detect a proper output timing, where the captions generated by the two Trans-formers become sufficiently close to each other.
arXiv Detail & Related papers (2021-08-04T16:20:00Z)
Simultaneous Speech Translation for Live Subtitling: from Delay to Display [13.35771688595446]
We explore the feasibility of simultaneous speech translation (SimulST) for live subtitling. We adapt SimulST systems to predict subtitle breaks along with the translation. We propose a display mode that exploits the predicted break structure by presenting the subtitles in scrolling lines.
arXiv Detail & Related papers (2021-07-19T12:35:49Z)
Stream-level Latency Evaluation for Simultaneous Machine Translation [5.50178437495268]
Simultaneous machine translation has recently gained traction thanks to significant quality improvements and the advent of streaming applications. This work proposes a stream-level adaptation of the current latency measures based on a re-segmentation approach applied to the output translation.
arXiv Detail & Related papers (2021-04-18T11:16:17Z)
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation [23.685648804345984]
Simultaneous text translation and end-to-end speech translation have recently made great progress but little work has combined these tasks together. We investigate how to adapt simultaneous text translation methods such as wait-k and monotonic multihead attention to end-to-end simultaneous speech translation by introducing a pre-decision module. A detailed analysis is provided on the latency-quality trade-offs of combining fixed and flexible pre-decision with fixed and flexible policies.
arXiv Detail & Related papers (2020-11-03T22:47:58Z)
SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario. SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.