Nested-Wasserstein Self-Imitation Learning for Sequence Generation
- URL: http://arxiv.org/abs/2001.06944v1
- Date: Mon, 20 Jan 2020 02:19:13 GMT
- Title: Nested-Wasserstein Self-Imitation Learning for Sequence Generation
- Authors: Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence
Carin
- Abstract summary: We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
- Score: 158.19606942252284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has been widely studied for improving
sequence-generation models. However, the conventional rewards used for RL
training typically cannot capture sufficient semantic information and therefore
render model bias. Further, the sparse and delayed rewards make RL exploration
inefficient. To alleviate these issues, we propose the concept of
nested-Wasserstein distance for distributional semantic matching. To further
exploit it, a novel nested-Wasserstein self-imitation learning framework is
developed, encouraging the model to exploit historical high-rewarded sequences
for enhanced exploration and better semantic matching. Our solution can be
understood as approximately executing proximal policy optimization with
Wasserstein trust-regions. Experiments on a variety of unconditional and
conditional sequence-generation tasks demonstrate the proposed approach
consistently leads to improved performance.
Related papers
- Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning [9.025671446527694]
Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent.
We formulate the task of fine-tuning diffusion models, with reward functions learned from human feedback, as an exploratory continuous-time control problem.
We develop the corresponding continuous-time RL theory for policy optimization and regularization under assumptions of different equations.
arXiv Detail & Related papers (2024-09-12T21:12:21Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Reinforcement Replaces Supervision: Query focused Summarization using
Deep Reinforcement Learning [43.123290672073814]
We deal with systems that generate summaries from document(s) based on a query.
Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, we use an RL-based approach for this task.
We develop multiple Policy Gradient networks, trained on various reward signals: ROUGE, BLEU, and Semantic Similarity.
arXiv Detail & Related papers (2023-11-29T10:38:16Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Hyperbolic Deep Reinforcement Learning [8.983647543608226]
We propose a new class of deep reinforcement learning algorithms that model latent representations in hyperbolic space.
We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks.
arXiv Detail & Related papers (2022-10-04T12:03:04Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders [22.54887526392739]
We propose a novel approach to training models with deep-latent hierarchies based on Optimal Transport.
We show that our method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known "latent variable collapse" issue of VAEs.
arXiv Detail & Related papers (2020-10-07T15:04:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.