Universal Simultaneous Machine Translation with Mixture-of-Experts
Wait-k Policy
- URL: http://arxiv.org/abs/2109.05238v2
- Date: Tue, 14 Sep 2021 01:31:39 GMT
- Title: Universal Simultaneous Machine Translation with Mixture-of-Experts
Wait-k Policy
- Authors: Shaolei Zhang, Yang Feng
- Abstract summary: Simultaneous machine translation (SiMT) generates translation before reading the entire source sentence.
Previous methods usually need to train multiple SiMT models for different latency levels, resulting in large computational costs.
We propose a universal SiMT model with Mixture-of-Experts Wait-k Policy to achieve the best translation quality under arbitrary latency.
- Score: 6.487736084189248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Simultaneous machine translation (SiMT) generates translation before reading
the entire source sentence and hence it has to trade off between translation
quality and latency. To fulfill the requirements of different translation
quality and latency in practical applications, the previous methods usually
need to train multiple SiMT models for different latency levels, resulting in
large computational costs. In this paper, we propose a universal SiMT model
with Mixture-of-Experts Wait-k Policy to achieve the best translation quality
under arbitrary latency with only one trained model. Specifically, our method
employs multi-head attention to accomplish the mixture of experts where each
head is treated as a wait-k expert with its own waiting words number, and given
a test latency and source inputs, the weights of the experts are accordingly
adjusted to produce the best translation. Experiments on three datasets show
that our method outperforms all the strong baselines under different latency,
including the state-of-the-art adaptive policy.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Adaptive Policy with Wait-$k$ Model for Simultaneous Translation [20.45004823667775]
Simultaneous machine translation (SiMT) requires a robust read/write policy in conjunction with a high-quality translation model.
Traditional methods rely on either a fixed wait-$k$ policy coupled with a standalone wait-$k$ translation model, or an adaptive policy jointly trained with the translation model.
We propose a more flexible approach by decoupling the adaptive policy model from the translation model.
arXiv Detail & Related papers (2023-10-23T12:16:32Z) - Simultaneous Machine Translation with Tailored Reference [35.46823126036308]
Simultaneous machine translation (SiMT) generates translation while reading the whole source sentence.
Existing SiMT models are typically trained using the same reference disregarding the varying amounts of available source information at different latency.
We propose a novel method that provides tailored reference for the SiMT models trained at different latency by rephrasing the ground-truth.
arXiv Detail & Related papers (2023-10-20T15:32:26Z) - Learning Optimal Policy for Simultaneous Machine Translation via Binary
Search [17.802607889752736]
Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence.
The policy determines the number of source tokens read during the translation of each target token.
We present a new method for constructing the optimal policy online via binary search.
arXiv Detail & Related papers (2023-05-22T07:03:06Z) - Improving Simultaneous Machine Translation with Monolingual Data [94.1085601198393]
Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model.
We propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD.
arXiv Detail & Related papers (2022-12-02T14:13:53Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z) - Exploiting Curriculum Learning in Unsupervised Neural Machine
Translation [28.75229367700697]
We propose a curriculum learning method to gradually utilize pseudo bi-texts based on their quality from multiple granularities.
Experimental results on WMT 14 En-Fr, WMT 16 En-De, WMT 16 En-Ro, and LDC En-Zh translation tasks demonstrate that the proposed method achieves consistent improvements with faster convergence speed.
arXiv Detail & Related papers (2021-09-23T07:18:06Z) - SimulEval: An Evaluation Toolkit for Simultaneous Translation [59.02724214432792]
Simultaneous translation on both text and speech focuses on a real-time and low-latency scenario.
SimulEval is an easy-to-use and general evaluation toolkit for both simultaneous text and speech translation.
arXiv Detail & Related papers (2020-07-31T17:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.