Related papers: Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

URL: http://arxiv.org/abs/2201.00236v1
Date: Sat, 1 Jan 2022 19:52:38 GMT
Title: Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning
Authors: Ziyang Tang, Yihao Feng, Qiang Liu
Abstract summary: Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications. Standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly. We advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function.
Score: 20.12564350629561
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has drawn increasing interests in recent years due to its tremendous success in various applications. However, standard RL algorithms can only be applied for single reward function, and cannot adapt to an unseen reward function quickly. In this paper, we advocate a general operator view of reinforcement learning, which enables us to directly approximate the operator that maps from reward function to value function. The benefit of learning the operator is that we can incorporate any new reward function as input and attain its corresponding value function in a zero-shot manner. To approximate this special type of operator, we design a number of novel operator neural network architectures based on its theoretical properties. Our design of operator networks outperform the existing methods and the standard design of general purpose operator network, and we demonstrate the benefit of our operator deep Q-learning framework in several tasks including reward transferring for offline policy evaluation (OPE) and reward transferring for offline policy optimization in a range of tasks.

Related papers

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning [25.82540393199001]
CARD is a Reward Design framework that iteratively generates and improves reward function code. CARD includes a Coder that generates and verifies the code, while a Evaluator provides dynamic feedback to guide the Coder in improving the code.
arXiv Detail & Related papers (2024-10-18T17:51:51Z)
Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning [19.4531905603925]
i-QN is a principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods.
arXiv Detail & Related papers (2024-03-04T15:07:33Z)
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings [107.1837163643886]
We present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks.
arXiv Detail & Related papers (2024-02-27T01:59:02Z)
Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output. We use these attention weights to redistribute the reward along the whole completion. Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z)
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
A Novel Variational Lower Bound for Inverse Reinforcement Learning [5.370126167091961]
Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories. We present a new Variational Lower Bound for IRL (VLB-IRL) Our method simultaneously learns the reward function and policy under the learned reward function.
arXiv Detail & Related papers (2023-11-07T03:50:43Z)
Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning. Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z)
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL) In this paper, we consider the problem of adaptively utilizing a given shaping reward function. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z)
Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning [22.242379207077217]
We show how to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.
arXiv Detail & Related papers (2020-10-06T00:10:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.