Incremental Blockwise Beam Search for Simultaneous Speech Translation
with Controllable Quality-Latency Tradeoff
- URL: http://arxiv.org/abs/2309.11379v1
- Date: Wed, 20 Sep 2023 14:59:06 GMT
- Title: Incremental Blockwise Beam Search for Simultaneous Speech Translation
with Controllable Quality-Latency Tradeoff
- Authors: Peter Pol\'ak, Brian Yan, Shinji Watanabe, Alex Waibel, Ond\v{r}ej
Bojar
- Abstract summary: Blockwise self-attentional encoder models have emerged as one promising end-to-end approach to simultaneous speech translation.
We propose a modified incremental blockwise beam search incorporating local agreement or hold-$n$ policies for quality-latency control.
- Score: 49.75167556773752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Blockwise self-attentional encoder models have recently emerged as one
promising end-to-end approach to simultaneous speech translation. These models
employ a blockwise beam search with hypothesis reliability scoring to determine
when to wait for more input speech before translating further. However, this
method maintains multiple hypotheses until the entire speech input is consumed
-- this scheme cannot directly show a single \textit{incremental} translation
to users. Further, this method lacks mechanisms for \textit{controlling} the
quality vs. latency tradeoff. We propose a modified incremental blockwise beam
search incorporating local agreement or hold-$n$ policies for quality-latency
control. We apply our framework to models trained for online or offline
translation and demonstrate that both types can be effectively used in online
mode.
Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing
latency or 0.8-1.4 s latency improvement without changing quality.
Related papers
- FASST: Fast LLM-based Simultaneous Speech Translation [9.65638081954595]
Simultaneous speech translation (SST) takes streaming speech input and generates text translation on the fly.
We propose FASST, a fast large language model based method for streaming speech translation.
Experiment results show that FASST achieves the best quality-latency trade-off.
arXiv Detail & Related papers (2024-08-18T10:12:39Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - A Template-based Method for Constrained Neural Machine Translation [100.02590022551718]
We propose a template-based method that can yield results with high translation quality and match accuracy while keeping the decoding speed.
The generation and derivation of the template can be learned through one sequence-to-sequence training framework.
Experimental results show that the proposed template-based methods can outperform several representative baselines in lexically and structurally constrained translation tasks.
arXiv Detail & Related papers (2022-05-23T12:24:34Z) - Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models.
We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants.
We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z) - Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming
Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time.
Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation.
We propose SimulSLT, the first end-to-end simultaneous sign language translation model.
SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z) - Streaming Models for Joint Speech Recognition and Translation [11.657994715914748]
We develop an end-to-end streaming ST model based on a re-translation approach and compare against standard cascading approaches.
We also introduce a novel inference method for the joint case, interleaving both transcript and translation in generation and removing the need to use separate decoders.
arXiv Detail & Related papers (2021-01-22T15:16:54Z) - Presenting Simultaneous Translation in Limited Space [0.0]
Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency.
Subtitling must be shown promptly, incrementally, and with adequate time for reading.
We propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set.
arXiv Detail & Related papers (2020-09-18T18:37:03Z) - SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario.
SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.