Related papers: Simultaneous Translation Policies: From Fixed to Adaptive

Simultaneous Translation Policies: From Fixed to Adaptive

URL: http://arxiv.org/abs/2004.13169v2
Date: Sat, 2 May 2020 07:18:51 GMT
Title: Simultaneous Translation Policies: From Fixed to Adaptive
Authors: Baigong Zheng, Kaibo Liu, Renjie Zheng, Mingbo Ma, Hairong Liu, Liang Huang
Abstract summary: We design an algorithm to achieve adaptive policies via a simple composition of a set of fixed policies. Our algorithm can outperform fixed ones by up to 4 BLEU points for the same latency. It even surpasses the BLEU score of full-sentence translation in the greedy mode.
Score: 29.699912674525056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptive policies are better than fixed policies for simultaneous translation, since they can flexibly balance the tradeoff between translation quality and latency based on the current context information. But previous methods on obtaining adaptive policies either rely on complicated training process, or underperform simple fixed policies. We design an algorithm to achieve adaptive policies via a simple heuristic composition of a set of fixed policies. Experiments on Chinese -> English and German -> English show that our adaptive policies can outperform fixed ones by up to 4 BLEU points for the same latency, and more surprisingly, it even surpasses the BLEU score of full-sentence translation in the greedy mode (and very close to beam mode), but with much lower latency.

Related papers

Fat-to-Thin Policy Optimization: Offline RL with Sparse Policies [5.5938591697033555]
Sparse continuous policies are distributions that choose some actions at random yet keep strictly zero probability for the other actions. In this paper, we propose the first offline policy optimization algorithm that tackles this challenge: Fat-to-Thin Policy Optimization (FtTPO) We instantiate FtTPO with the general $q$-Gaussian family that encompasses both heavy-tailed and sparse policies.
arXiv Detail & Related papers (2025-01-24T10:11:48Z)
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error. In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z)
Supported Trust Region Optimization for Offline Reinforcement Learning [59.43508325943592]
We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy. We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset.
arXiv Detail & Related papers (2023-11-15T13:16:16Z)
Learning Optimal Policy for Simultaneous Machine Translation via Binary Search [17.802607889752736]
Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence. The policy determines the number of source tokens read during the translation of each target token. We present a new method for constructing the optimal policy online via binary search.
arXiv Detail & Related papers (2023-05-22T07:03:06Z)
LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous Machine Translation [6.411228564798412]
Simultaneous machine translation is useful in many live scenarios but very challenging due to the trade-off between accuracy and latency. We propose a novel adaptive training policy called LEAPT, which allows our machine translation model to learn how to translate source prefixes and make use of the future context.
arXiv Detail & Related papers (2023-03-21T11:17:37Z)
Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous Machine Translation [17.802607889752736]
Simultaneous machine translation (SiMT) starts its translation before reading the whole source sentence. We propose a method of performing the adaptive policy via integrating post-evaluation into the fixed policy.
arXiv Detail & Related papers (2022-10-21T11:57:14Z)
Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT. Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z)
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation [75.86581380817464]
A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF) Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances.
arXiv Detail & Related papers (2022-03-22T23:33:18Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.