Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement
Learning
- URL: http://arxiv.org/abs/2201.00236v1
- Date: Sat, 1 Jan 2022 19:52:38 GMT
- Title: Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement
Learning
- Authors: Ziyang Tang, Yihao Feng, Qiang Liu
- Abstract summary: Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications.
Standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly.
We advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function.
- Score: 20.12564350629561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has drawn increasing interests in recent years
due to its tremendous success in various applications. However, standard RL
algorithms can only be applied for single reward function, and cannot adapt to
an unseen reward function quickly. In this paper, we advocate a general
operator view of reinforcement learning, which enables us to directly
approximate the operator that maps from reward function to value function. The
benefit of learning the operator is that we can incorporate any new reward
function as input and attain its corresponding value function in a zero-shot
manner. To approximate this special type of operator, we design a number of
novel operator neural network architectures based on its theoretical
properties. Our design of operator networks outperform the existing methods and
the standard design of general purpose operator network, and we demonstrate the
benefit of our operator deep Q-learning framework in several tasks including
reward transferring for offline policy evaluation (OPE) and reward transferring
for offline policy optimization in a range of tasks.
Related papers
- A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning [25.82540393199001]
CARD is a Reward Design framework that iteratively generates and improves reward function code.
CARD includes a Coder that generates and verifies the code, while a Evaluator provides dynamic feedback to guide the Coder in improving the code.
arXiv Detail & Related papers (2024-10-18T17:51:51Z) - Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning [19.4531905603925]
i-QN is a principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions.
We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods.
arXiv Detail & Related papers (2024-03-04T15:07:33Z) - Unsupervised Zero-Shot Reinforcement Learning via Functional Reward
Encodings [107.1837163643886]
We present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem.
Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples.
We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks.
arXiv Detail & Related papers (2024-02-27T01:59:02Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - A Novel Variational Lower Bound for Inverse Reinforcement Learning [5.370126167091961]
Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories.
We present a new Variational Lower Bound for IRL (VLB-IRL)
Our method simultaneously learns the reward function and policy under the learned reward function.
arXiv Detail & Related papers (2023-11-07T03:50:43Z) - Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z) - Reward Machines: Exploiting Reward Function Structure in Reinforcement
Learning [22.242379207077217]
We show how to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies.
First, we propose reward machines, a type of finite state machine that supports the specification of reward functions.
We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.
arXiv Detail & Related papers (2020-10-06T00:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.