Related papers: Programmatic Reward Design by Example

Programmatic Reward Design by Example

URL: http://arxiv.org/abs/2112.08438v1
Date: Tue, 14 Dec 2021 05:46:24 GMT
Title: Programmatic Reward Design by Example
Authors: Weichao Zhou, Wenchao Li
Abstract summary: A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. We propose the idea of textitprogrammatic reward design, i.e. using programs to specify the reward functions in reinforcement learning environments. A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations.
Score: 7.188571996124112
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of \textit{programmatic reward design}, i.e. using programs to specify the reward functions in RL environments. Programs allow human engineers to express sub-goals and complex task scenarios in a structured and interpretable way. The challenge of programmatic reward design, however, is that while humans can provide the high-level structures, properly setting the low-level details, such as the right amount of reward for a specific sub-task, remains difficult. A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations. Inspired by recent generative-adversarial approaches, our framework {searches for the most likely programmatic reward function under which the optimally generated trajectories cannot be differentiated from the demonstrated trajectories}. Experimental results show that programmatic reward functions learned using this framework can significantly outperform those learned using existing reward learning algorithms, and enable RL agents to achieve state-of-the-art performance on highly complex tasks.

Related papers

Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints. We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences. Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
arXiv Detail & Related papers (2024-10-22T08:07:44Z)
Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts. Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards. We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z)
Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z)
Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process. We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z)
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL) We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z)
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications. In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z)
Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards [46.068337522093096]
We introduce the concept of motivation which captures the underlying goal of maximizing certain rewards. Our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
arXiv Detail & Related papers (2022-07-29T14:52:02Z)
A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines [7.661766773170363]
A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals.
arXiv Detail & Related papers (2022-04-20T20:22:00Z)
Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z)
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL) In this paper, we consider the problem of adaptively utilizing a given shaping reward function. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z)
Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning [22.242379207077217]
We show how to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.
arXiv Detail & Related papers (2020-10-06T00:10:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.