Assisted Robust Reward Design
- URL: http://arxiv.org/abs/2111.09884v1
- Date: Thu, 18 Nov 2021 18:59:33 GMT
- Title: Assisted Robust Reward Design
- Authors: Jerry Zhi-Yang He, Anca D. Dragan
- Abstract summary: In practice, reward design is an iterative process: the designer chooses a reward, eventually encounters an "edge-case" environment where the reward incentivizes the wrong behavior, revises the reward, and repeats.
We propose that the robot not take the specified reward for granted, but rather have uncertainty about it, and account for the future design iterations as future evidence.
We test this method in a simplified autonomous driving task and find that it more quickly improves the car's behavior in held-out environments by proposing environments that are "edge cases" for the current reward.
- Score: 33.55440481096258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world robotic tasks require complex reward functions. When we define the
problem the robot needs to solve, we pretend that a designer specifies this
complex reward exactly, and it is set in stone from then on. In practice,
however, reward design is an iterative process: the designer chooses a reward,
eventually encounters an "edge-case" environment where the reward incentivizes
the wrong behavior, revises the reward, and repeats. What would it mean to
rethink robotics problems to formally account for this iterative nature of
reward design? We propose that the robot not take the specified reward for
granted, but rather have uncertainty about it, and account for the future
design iterations as future evidence. We contribute an Assisted Reward Design
method that speeds up the design process by anticipating and influencing this
future evidence: rather than letting the designer eventually encounter failure
cases and revise the reward then, the method actively exposes the designer to
such environments during the development phase. We test this method in a
simplified autonomous driving task and find that it more quickly improves the
car's behavior in held-out environments by proposing environments that are
"edge cases" for the current reward.
Related papers
- Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions [0.0]
Reinforcement learning has become an essential algorithm for generating complex robotic behaviors.
To learn such behaviors, it is necessary to design a reward function that describes the task.
In this paper, we propose the concept of Constraints as Rewards (CaR)
arXiv Detail & Related papers (2025-01-08T01:59:47Z) - Synthesis of Reward Machines for Multi-Agent Equilibrium Design (Full Version) [2.2099217573031678]
We study the problem of equilibrium design using dynamic incentive structures, known as reward machines.
We show how reward machines can be used to represent dynamic incentives that allocate rewards in a manner that optimises the designer's goal.
We present two variants of the problem: strong and weak. We demonstrate that both can be solved in time using a Turing machine equipped with an NP oracle.
arXiv Detail & Related papers (2024-08-19T15:17:58Z) - Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft [88.80684763462384]
This paper introduces an advanced learning system, named Auto MC-Reward, that leverages Large Language Models (LLMs) to automatically design dense reward functions.
Experiments demonstrate a significant improvement in the success rate and learning efficiency of our agents in complex tasks in Minecraft.
arXiv Detail & Related papers (2023-12-14T18:58:12Z) - Go Beyond Imagination: Maximizing Episodic Reachability with World
Models [68.91647544080097]
In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination.
We apply learned world models to generate predicted future states with random actions.
Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks.
arXiv Detail & Related papers (2023-08-25T20:30:20Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Automatic Reward Design via Learning Motivation-Consistent Intrinsic
Rewards [46.068337522093096]
We introduce the concept of motivation which captures the underlying goal of maximizing certain rewards.
Our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
arXiv Detail & Related papers (2022-07-29T14:52:02Z) - Programmatic Reward Design by Example [7.188571996124112]
A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors.
We propose the idea of textitprogrammatic reward design, i.e. using programs to specify the reward functions in reinforcement learning environments.
A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations.
arXiv Detail & Related papers (2021-12-14T05:46:24Z) - Reward (Mis)design for Autonomous Driving [89.2504219865973]
We develop 8 simple sanity checks for identifying flaws in reward functions.
The checks are applied to reward functions from past work on reinforcement learning for autonomous driving.
We explore promising directions that may help future researchers design reward functions for AD.
arXiv Detail & Related papers (2021-04-28T17:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.