Related papers: Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation

Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation

URL: http://arxiv.org/abs/2405.01280v2
Date: Tue, 2 Jul 2024 13:41:56 GMT
Title: Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Authors: Hao Wang, Tetsuro Morimura, Ukyo Honda, Daisuke Kawahara,
Abstract summary: Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT) A performance gap exists between NAR and autoregressive models due to the large decoding space and difficulty capturing independency between target words accurately. We apply reinforcement learning (RL) to Levenshtein Transformer, a representative edit-based NAR model, demonstrating that RL with self-generated data can enhance the performance of edit-based NAR models.
Score: 15.632419297059993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT). However, a performance gap exists between NAR and autoregressive models due to the large decoding space and difficulty in capturing dependency between target words accurately. Compounding this, preparing appropriate training data for NAR models is a non-trivial task, often exacerbating exposure bias. To address these challenges, we apply reinforcement learning (RL) to Levenshtein Transformer, a representative edit-based NAR model, demonstrating that RL with self-generated data can enhance the performance of edit-based NAR models. We explore two RL approaches: stepwise reward maximization and episodic reward maximization. We discuss the respective pros and cons of these two approaches and empirically verify them. Moreover, we experimentally investigate the impact of temperature setting on performance, confirming the importance of proper temperature setting for NAR models' training.

Related papers

Behavior Injection: Preparing Language Models for Reinforcement Learning [24.46625106928253]
Reinforcement fine-tuning (RFT) has emerged as a powerful post-training technique to incentivize the reasoning ability of large language models (LLMs)<n>LLMs can respond very inconsistently to RFT: some show substantial performance gains, while others plateau or even degrade.<n>We propose behavior injection, a task-agnostic data-augmentation scheme applied prior to RL.
arXiv Detail & Related papers (2025-05-25T00:54:50Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs) We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z)
Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT) We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z)
Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving [63.155562267383864]
Deep reinforcement learning (DRL) has shown remarkable success in complex autonomous driving scenarios. DRL models inevitably bring high memory consumption and computation, which hinders their wide deployment in resource-limited autonomous driving devices. We introduce a novel dynamic structured pruning approach that gradually removes a DRL model's unimportant neurons during the training stage.
arXiv Detail & Related papers (2024-02-07T09:00:30Z)
Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed [8.184624214651283]
We propose a generic Guided Non-Autoregressive Knowledge Distillation (GNARKD) method to obtain high-performance NAR models having a low inference latency. We evaluate GNARKD by applying it to three widely adopted AR models to obtain NAR VRP solvers for both synthesized and real-world instances.
arXiv Detail & Related papers (2023-12-19T07:13:32Z)
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z)
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems [84.47091735503979]
We point out flaws in the evaluation methodology present in the literature on NAR models. We compare NAR models with other widely used methods for improving efficiency. We call for more realistic and extensive evaluation of NAR models in future work.
arXiv Detail & Related papers (2022-05-04T09:30:17Z)
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR) The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z)
TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT) The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z)
Improving Non-autoregressive Neural Machine Translation with Monolingual Data [13.43438045177293]
Non-autoregressive (NAR) neural machine translation is usually done via knowledge distillation from an autoregressive (AR) model. We leverage large monolingual corpora to improve the NAR model's performance.
arXiv Detail & Related papers (2020-05-02T22:24:52Z)
A Study of Non-autoregressive Model for Sequence Generation [147.89525760170923]
Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel. We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
arXiv Detail & Related papers (2020-04-22T09:16:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.