Programmatic Reward Design by Example
- URL: http://arxiv.org/abs/2112.08438v1
- Date: Tue, 14 Dec 2021 05:46:24 GMT
- Title: Programmatic Reward Design by Example
- Authors: Weichao Zhou, Wenchao Li
- Abstract summary: A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors.
We propose the idea of textitprogrammatic reward design, i.e. using programs to specify the reward functions in reinforcement learning environments.
A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations.
- Score: 7.188571996124112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward design is a fundamental problem in reinforcement learning (RL). A
misspecified or poorly designed reward can result in low sample efficiency and
undesired behaviors. In this paper, we propose the idea of \textit{programmatic
reward design}, i.e. using programs to specify the reward functions in RL
environments. Programs allow human engineers to express sub-goals and complex
task scenarios in a structured and interpretable way. The challenge of
programmatic reward design, however, is that while humans can provide the
high-level structures, properly setting the low-level details, such as the
right amount of reward for a specific sub-task, remains difficult. A major
contribution of this paper is a probabilistic framework that can infer the best
candidate programmatic reward function from expert demonstrations. Inspired by
recent generative-adversarial approaches, our framework {searches for the most
likely programmatic reward function under which the optimally generated
trajectories cannot be differentiated from the demonstrated trajectories}.
Experimental results show that programmatic reward functions learned using this
framework can significantly outperform those learned using existing reward
learning algorithms, and enable RL agents to achieve state-of-the-art
performance on highly complex tasks.
Related papers
- Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints.
We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences.
Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
arXiv Detail & Related papers (2024-10-22T08:07:44Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process.
We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z) - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with
Symbolic Reward Machines [7.661766773170363]
A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning problems.
We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals.
arXiv Detail & Related papers (2022-04-20T20:22:00Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z) - Reward Machines: Exploiting Reward Function Structure in Reinforcement
Learning [22.242379207077217]
We show how to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies.
First, we propose reward machines, a type of finite state machine that supports the specification of reward functions.
We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.
arXiv Detail & Related papers (2020-10-06T00:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.