Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous
Machine Translation
- URL: http://arxiv.org/abs/2210.11900v1
- Date: Fri, 21 Oct 2022 11:57:14 GMT
- Title: Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous
Machine Translation
- Authors: Shoutao Guo, Shaolei Zhang, Yang Feng
- Abstract summary: Simultaneous machine translation (SiMT) starts its translation before reading the whole source sentence.
We propose a method of performing the adaptive policy via integrating post-evaluation into the fixed policy.
- Score: 17.802607889752736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous machine translation (SiMT) starts its translation before reading
the whole source sentence and employs either fixed or adaptive policy to
generate the target sentence. Compared to the fixed policy, the adaptive policy
achieves better latency-quality tradeoffs by adopting a flexible translation
policy. If the policy can evaluate rationality before taking action, the
probability of incorrect actions will also decrease. However, previous methods
lack evaluation of actions before taking them. In this paper, we propose a
method of performing the adaptive policy via integrating post-evaluation into
the fixed policy. Specifically, whenever a candidate token is generated, our
model will evaluate the rationality of the next action by measuring the change
in the source content. Our model will then take different actions based on the
evaluation results. Experiments on three translation tasks show that our method
can exceed strong baselines under all latency.
Related papers
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Adaptive Policy with Wait-$k$ Model for Simultaneous Translation [20.45004823667775]
Simultaneous machine translation (SiMT) requires a robust read/write policy in conjunction with a high-quality translation model.
Traditional methods rely on either a fixed wait-$k$ policy coupled with a standalone wait-$k$ translation model, or an adaptive policy jointly trained with the translation model.
We propose a more flexible approach by decoupling the adaptive policy model from the translation model.
arXiv Detail & Related papers (2023-10-23T12:16:32Z) - Learning Optimal Policy for Simultaneous Machine Translation via Binary
Search [17.802607889752736]
Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence.
The policy determines the number of source tokens read during the translation of each target token.
We present a new method for constructing the optimal policy online via binary search.
arXiv Detail & Related papers (2023-05-22T07:03:06Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
Translation [75.86581380817464]
A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write.
This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF)
Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances.
arXiv Detail & Related papers (2022-03-22T23:33:18Z) - Sayer: Using Implicit Feedback to Optimize System Policies [63.992191765269396]
We develop a methodology that leverages implicit feedback to evaluate and train new system policies.
Sayer builds on two ideas from reinforcement learning to leverage data collected by an existing policy.
We show that Sayer can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.
arXiv Detail & Related papers (2021-10-28T04:16:56Z) - Variance Penalized On-Policy and Off-Policy Actor-Critic [60.06593931848165]
We propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return.
Our approach not only performs on par with actor-critic and prior variance-penalization baselines in terms of expected return, but also generates trajectories which have lower variance in the return.
arXiv Detail & Related papers (2021-02-03T10:06:16Z) - Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under
Batch Update Policy [8.807587076209566]
The goal of off-policy evaluation (OPE) is to evaluate a new policy using historical data obtained via a behavior policy.
Because the contextual bandit updates the policy based on past observations, the samples are not independent and identically distributed.
This paper tackles this problem by constructing an estimator from a martingale difference sequence (MDS) for the dependent samples.
arXiv Detail & Related papers (2020-10-23T15:22:57Z) - Efficient Evaluation of Natural Stochastic Policies in Offline
Reinforcement Learning [80.42316902296832]
We study the efficient off-policy evaluation of natural policies, which are defined in terms of deviations from the behavior policy.
This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies.
arXiv Detail & Related papers (2020-06-06T15:08:24Z) - Simultaneous Translation Policies: From Fixed to Adaptive [29.699912674525056]
We design an algorithm to achieve adaptive policies via a simple composition of a set of fixed policies.
Our algorithm can outperform fixed ones by up to 4 BLEU points for the same latency.
It even surpasses the BLEU score of full-sentence translation in the greedy mode.
arXiv Detail & Related papers (2020-04-27T20:56:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.