Simultaneous Translation Policies: From Fixed to Adaptive
- URL: http://arxiv.org/abs/2004.13169v2
- Date: Sat, 2 May 2020 07:18:51 GMT
- Title: Simultaneous Translation Policies: From Fixed to Adaptive
- Authors: Baigong Zheng, Kaibo Liu, Renjie Zheng, Mingbo Ma, Hairong Liu, Liang
Huang
- Abstract summary: We design an algorithm to achieve adaptive policies via a simple composition of a set of fixed policies.
Our algorithm can outperform fixed ones by up to 4 BLEU points for the same latency.
It even surpasses the BLEU score of full-sentence translation in the greedy mode.
- Score: 29.699912674525056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive policies are better than fixed policies for simultaneous
translation, since they can flexibly balance the tradeoff between translation
quality and latency based on the current context information. But previous
methods on obtaining adaptive policies either rely on complicated training
process, or underperform simple fixed policies. We design an algorithm to
achieve adaptive policies via a simple heuristic composition of a set of fixed
policies. Experiments on Chinese -> English and German -> English show that our
adaptive policies can outperform fixed ones by up to 4 BLEU points for the same
latency, and more surprisingly, it even surpasses the BLEU score of
full-sentence translation in the greedy mode (and very close to beam mode), but
with much lower latency.
Related papers
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - Supported Trust Region Optimization for Offline Reinforcement Learning [59.43508325943592]
We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy.
We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset.
arXiv Detail & Related papers (2023-11-15T13:16:16Z) - Learning Optimal Policy for Simultaneous Machine Translation via Binary
Search [17.802607889752736]
Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence.
The policy determines the number of source tokens read during the translation of each target token.
We present a new method for constructing the optimal policy online via binary search.
arXiv Detail & Related papers (2023-05-22T07:03:06Z) - LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous
Machine Translation [6.411228564798412]
Simultaneous machine translation is useful in many live scenarios but very challenging due to the trade-off between accuracy and latency.
We propose a novel adaptive training policy called LEAPT, which allows our machine translation model to learn how to translate source prefixes and make use of the future context.
arXiv Detail & Related papers (2023-03-21T11:17:37Z) - Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous
Machine Translation [17.802607889752736]
Simultaneous machine translation (SiMT) starts its translation before reading the whole source sentence.
We propose a method of performing the adaptive policy via integrating post-evaluation into the fixed policy.
arXiv Detail & Related papers (2022-10-21T11:57:14Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
Translation [75.86581380817464]
A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write.
This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF)
Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances.
arXiv Detail & Related papers (2022-03-22T23:33:18Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.