Automatic Reward Design via Learning Motivation-Consistent Intrinsic
Rewards
- URL: http://arxiv.org/abs/2207.14722v1
- Date: Fri, 29 Jul 2022 14:52:02 GMT
- Title: Automatic Reward Design via Learning Motivation-Consistent Intrinsic
Rewards
- Authors: Yixiang Wang, Yujing Hu, Feng Wu, Yingfeng Chen
- Abstract summary: We introduce the concept of motivation which captures the underlying goal of maximizing certain rewards.
Our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
- Score: 46.068337522093096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward design is a critical part of the application of reinforcement
learning, the performance of which strongly depends on how well the reward
signal frames the goal of the designer and how well the signal assesses
progress in reaching that goal. In many cases, the extrinsic rewards provided
by the environment (e.g., win or loss of a game) are very sparse and make it
difficult to train agents directly. Researchers usually assist the learning of
agents by adding some auxiliary rewards in practice. However, designing
auxiliary rewards is often turned to a trial-and-error search for reward
settings that produces acceptable results. In this paper, we propose to
automatically generate goal-consistent intrinsic rewards for the agent to
learn, by maximizing which the expected accumulative extrinsic rewards can be
maximized. To this end, we introduce the concept of motivation which captures
the underlying goal of maximizing certain rewards and propose the motivation
based reward design method. The basic idea is to shape the intrinsic rewards by
minimizing the distance between the intrinsic and extrinsic motivations. We
conduct extensive experiments and show that our method performs better than the
state-of-the-art methods in handling problems of delayed reward, exploration,
and credit assignment.
Related papers
- Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints.
We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences.
Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
arXiv Detail & Related papers (2024-10-22T08:07:44Z) - Informativeness of Reward Functions in Reinforcement Learning [34.40155383189179]
We study the problem of designing informative reward functions so that the designed rewards speed up the agent's convergence.
Existing works have considered several different reward design formulations.
We propose a reward informativeness criterion that adapts w.r.t. the agent's current policy and can be optimized under specified structural constraints.
arXiv Detail & Related papers (2024-02-10T18:36:42Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - DreamSmooth: Improving Model-based Reinforcement Learning via Reward
Smoothing [60.21269454707625]
DreamSmooth learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep.
We show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks.
arXiv Detail & Related papers (2023-11-02T17:57:38Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Go Beyond Imagination: Maximizing Episodic Reachability with World
Models [68.91647544080097]
In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination.
We apply learned world models to generate predicted future states with random actions.
Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks.
arXiv Detail & Related papers (2023-08-25T20:30:20Z) - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Designing Rewards for Fast Learning [18.032654606016447]
We look at how reward-design choices impact learning speed and seek to identify principles of good reward design that quickly induce target behavior.
We propose a linear-programming based algorithm that efficiently finds a reward function that maximizes action gap and minimizes subjective discount.
arXiv Detail & Related papers (2022-05-30T19:48:52Z) - Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning.
We extensively evaluate our model by measuring the agent's performance in terms of environment exploration.
Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.