Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2109.02332v1
- Date: Mon, 6 Sep 2021 10:06:48 GMT
- Title: Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning
- Authors: Ning Wei, Jiahua Liang, Di Xie and Shiliang Pu
- Abstract summary: We propose a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space.
We demonstrate the feasibility of this approach and study one of its potential application in policy performance boosting with multiple MuJoCo tasks.
- Score: 37.61951923445689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing optimal reward functions has been desired but extremely difficult
in reinforcement learning (RL). When it comes to modern complex tasks,
sophisticated reward functions are widely used to simplify policy learning yet
even a tiny adjustment on them is expensive to evaluate due to the drastically
increasing cost of training. To this end, we propose a hindsight reward
tweaking approach by designing a novel paradigm for deep reinforcement learning
to model the influences of reward functions within a near-optimal space. We
simply extend the input observation with a condition vector linearly correlated
with the effective environment reward parameters and train the model in a
conventional manner except for randomizing reward configurations, obtaining a
hyper-policy whose characteristics are sensitively regulated over the condition
space. We demonstrate the feasibility of this approach and study one of its
potential application in policy performance boosting with multiple MuJoCo
tasks.
Related papers
- Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints.
We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences.
Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
arXiv Detail & Related papers (2024-10-22T08:07:44Z) - ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization [41.074747242532695]
Online Reward Selection and Policy Optimization (ORSO) is a novel approach that frames shaping reward selection as an online model selection problem.
ORSO employs principled exploration strategies to automatically identify promising shaping reward functions without human intervention.
We demonstrate ORSO's effectiveness across various continuous control tasks using the Isaac Gym simulator.
arXiv Detail & Related papers (2024-10-17T17:55:05Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Internally Rewarded Reinforcement Learning [22.01249652558878]
We study a class of reinforcement learning problems where the reward signals for policy learning are generated by an internal reward model.
We show that the proposed reward function can consistently stabilize the training process by reducing the impact of reward noise.
arXiv Detail & Related papers (2023-02-01T06:25:46Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.