A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with
Symbolic Reward Machines
- URL: http://arxiv.org/abs/2204.09772v1
- Date: Wed, 20 Apr 2022 20:22:00 GMT
- Title: A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with
Symbolic Reward Machines
- Authors: Weichao Zhou, Wenchao Li
- Abstract summary: A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning problems.
We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals.
- Score: 7.661766773170363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A misspecified reward can degrade sample efficiency and induce undesired
behaviors in reinforcement learning (RL) problems. We propose symbolic reward
machines for incorporating high-level task knowledge when specifying the reward
signals. Symbolic reward machines augment existing reward machine formalism by
allowing transitions to carry predicates and symbolic reward outputs. This
formalism lends itself well to inverse reinforcement learning, whereby the key
challenge is determining appropriate assignments to the symbolic values from a
few expert demonstrations. We propose a hierarchical Bayesian approach for
inferring the most likely assignments such that the concretized reward machine
can discriminate expert demonstrated trajectories from other trajectories with
high accuracy. Experimental results show that learned reward machines can
significantly improve training efficiency for complex RL tasks and generalize
well across different task environment configurations.
Related papers
- Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning [44.770495418026734]
Reinforcement Learning (RL) empowers agents to acquire various skills by learning from reward signals.
Traditional methods assume the existence of underlying Markovian rewards and that the observed delayed reward is simply the sum of instance-level rewards.
We propose Composite Delayed Reward Transformer (CoDeTr), which incorporates a specialized in-sequence attention mechanism.
arXiv Detail & Related papers (2024-10-26T13:12:27Z) - Maximally Permissive Reward Machines [8.425937972214667]
We propose a new approach to synthesising reward machines based on the set of partial order plans for a goal.
We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan.
arXiv Detail & Related papers (2024-08-15T09:59:26Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process.
We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z) - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Model-Free Reinforcement Learning for Symbolic Automata-encoded
Objectives [0.0]
Reinforcement learning (RL) is a popular approach for robotic path planning in uncertain environments.
Poorly designed rewards can lead to policies that do get maximal rewards but fail to satisfy desired task objectives or are unsafe.
We propose using formal specifications in the form of symbolic automata.
arXiv Detail & Related papers (2022-02-04T21:54:36Z) - The Effects of Reward Misspecification: Mapping and Mitigating
Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.
We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time.
We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z) - Programmatic Reward Design by Example [7.188571996124112]
A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors.
We propose the idea of textitprogrammatic reward design, i.e. using programs to specify the reward functions in reinforcement learning environments.
A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations.
arXiv Detail & Related papers (2021-12-14T05:46:24Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.