Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning
for Task-oriented Dialogue Systems
- URL: http://arxiv.org/abs/2302.10342v1
- Date: Mon, 20 Feb 2023 22:10:04 GMT
- Title: Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning
for Task-oriented Dialogue Systems
- Authors: Yihao Feng, Shentao Yang, Shujian Zhang, Jianguo Zhang, Caiming Xiong,
Mingyuan Zhou, Huan Wang
- Abstract summary: reinforcement learning (RL) techniques can naturally be utilized to train dialogue strategies to achieve user-specific goals.
This paper aims at answering the question of how to efficiently learn and leverage a reward function for training end-to-end (E2E) ToD agents.
- Score: 111.80916118530398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When learning task-oriented dialogue (ToD) agents, reinforcement learning
(RL) techniques can naturally be utilized to train dialogue strategies to
achieve user-specific goals. Prior works mainly focus on adopting advanced RL
techniques to train the ToD agents, while the design of the reward function is
not well studied. This paper aims at answering the question of how to
efficiently learn and leverage a reward function for training end-to-end (E2E)
ToD agents. Specifically, we introduce two generalized objectives for
reward-function learning, inspired by the classical learning-to-rank
literature. Further, we utilize the learned reward function to guide the
training of the E2E ToD agent. With the proposed techniques, we achieve
competitive results on the E2E response-generation task on the Multiwoz 2.0
dataset. Source code and checkpoints are publicly released at
https://github.com/Shentao-YANG/Fantastic_Reward_ICLR2023.
Related papers
- A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning [25.82540393199001]
CARD is a Reward Design framework that iteratively generates and improves reward function code.
CARD includes a Coder that generates and verifies the code, while a Evaluator provides dynamic feedback to guide the Coder in improving the code.
arXiv Detail & Related papers (2024-10-18T17:51:51Z) - Learning Reward for Robot Skills Using Large Language Models via Self-Alignment [11.639973274337274]
Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions.
We propose a method to learn rewards more efficiently in the absence of humans.
arXiv Detail & Related papers (2024-05-12T04:57:43Z) - Reward Design with Language Models [27.24197025688919]
Reward design in reinforcement learning (RL) is challenging since specifying human notions of desired behavior may be difficult via reward functions or require expert demonstrations.
Can we instead cheaply design rewards using a natural language interface?
This paper explores how to simplify reward design by prompting a large language model (LLM) such as GPT-3 as a proxy reward function.
arXiv Detail & Related papers (2023-02-27T22:09:35Z) - Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals [69.76245723797368]
Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers.
Various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.
arXiv Detail & Related papers (2023-02-09T05:47:03Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Successor Feature Neural Episodic Control [17.706998080391635]
A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals.
This paper investigates the integration of two frameworks for tackling those goals: episodic control and successor features.
arXiv Detail & Related papers (2021-11-04T19:14:43Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.