SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End
Simultaneous Speech Translation
- URL: http://arxiv.org/abs/2011.02048v1
- Date: Tue, 3 Nov 2020 22:47:58 GMT
- Title: SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End
Simultaneous Speech Translation
- Authors: Xutai Ma, Juan Pino, Philipp Koehn
- Abstract summary: Simultaneous text translation and end-to-end speech translation have recently made great progress but little work has combined these tasks together.
We investigate how to adapt simultaneous text translation methods such as wait-k and monotonic multihead attention to end-to-end simultaneous speech translation by introducing a pre-decision module.
A detailed analysis is provided on the latency-quality trade-offs of combining fixed and flexible pre-decision with fixed and flexible policies.
- Score: 23.685648804345984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous text translation and end-to-end speech translation have recently
made great progress but little work has combined these tasks together. We
investigate how to adapt simultaneous text translation methods such as wait-k
and monotonic multihead attention to end-to-end simultaneous speech translation
by introducing a pre-decision module. A detailed analysis is provided on the
latency-quality trade-offs of combining fixed and flexible pre-decision with
fixed and flexible policies. We also design a novel computation-aware latency
metric, adapted from Average Lagging.
Related papers
- A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation [48.84039953531355]
We propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2X)
NAST-S2X integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework.
It achieves high-quality simultaneous interpretation within a delay of less than 3 seconds and provides a 28 times decoding speedup in offline generation.
arXiv Detail & Related papers (2024-06-11T04:25:48Z) - Shiftable Context: Addressing Training-Inference Context Mismatch in
Simultaneous Speech Translation [0.17188280334580192]
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation.
We propose Shiftable Context to ensure consistent segment and context sizes are maintained throughout training and inference.
arXiv Detail & Related papers (2023-07-03T22:11:51Z) - M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation [66.92823764664206]
We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text.
While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation.
Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU.
arXiv Detail & Related papers (2022-07-03T04:26:53Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - STEMM: Self-learning with Speech-text Manifold Mixup for Speech
Translation [37.51435498386953]
We propose the Speech-TExt Manifold Mixup (STEMM) method to calibrate such discrepancy.
Experiments on MuST-C speech translation benchmark and further analysis show that our method effectively alleviates the cross-modal representation discrepancy.
arXiv Detail & Related papers (2022-03-20T01:49:53Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - RealTranS: End-to-End Simultaneous Speech Translation with Convolutional
Weighted-Shrinking Transformer [33.876412404781846]
RealTranS is an end-to-end model for simultaneous speech translation.
It maps speech features into text space with a weighted-shrinking operation and a semantic encoder.
Experiments show that RealTranS with the Wait-K-Stride-N strategy outperforms prior end-to-end models.
arXiv Detail & Related papers (2021-06-09T06:35:46Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario.
SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.