Behavior Alignment via Reward Function Optimization
- URL: http://arxiv.org/abs/2310.19007v2
- Date: Tue, 31 Oct 2023 04:58:20 GMT
- Title: Behavior Alignment via Reward Function Optimization
- Authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno
Castro da Silva
- Abstract summary: We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
- Score: 23.92721220310242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing reward functions for efficiently guiding reinforcement learning
(RL) agents toward specific behaviors is a complex task. This is challenging
since it requires the identification of reward structures that are not sparse
and that avoid inadvertently inducing undesirable behaviors. Naively modifying
the reward structure to offer denser and more frequent feedback can lead to
unintended outcomes and promote behaviors that are not aligned with the
designer's intended goal. Although potential-based reward shaping is often
suggested as a remedy, we systematically investigate settings where deploying
it often significantly impairs performance. To address these issues, we
introduce a new framework that uses a bi-level objective to learn
\emph{behavior alignment reward functions}. These functions integrate auxiliary
rewards reflecting a designer's heuristics and domain knowledge with the
environment's primary rewards. Our approach automatically determines the most
effective way to blend these types of feedback, thereby enhancing robustness
against heuristic reward misspecification. Remarkably, it can also adapt an
agent's policy optimization process to mitigate suboptimalities resulting from
limitations and biases inherent in the underlying RL algorithms. We evaluate
our method's efficacy on a diverse set of tasks, from small-scale experiments
to high-dimensional control challenges. We investigate heuristic auxiliary
rewards of varying quality -- some of which are beneficial and others
detrimental to the learning process. Our results show that our framework offers
a robust and principled way to integrate designer-specified heuristics. It not
only addresses key shortcomings of existing approaches but also consistently
leads to high-performing solutions, even when given misaligned or
poorly-specified auxiliary reward functions.
Related papers
- REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Iterative Reward Shaping using Human Feedback for Correcting Reward
Misspecification [15.453123084827089]
ITERS is an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function.
We evaluate ITERS in three environments and show that it can successfully correct misspecified reward functions.
arXiv Detail & Related papers (2023-08-30T11:45:40Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Admissible Policy Teaching through Reward Design [32.39785256112934]
We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies.
The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible.
arXiv Detail & Related papers (2022-01-06T18:49:57Z) - Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning [37.61951923445689]
We propose a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space.
We demonstrate the feasibility of this approach and study one of its potential application in policy performance boosting with multiple MuJoCo tasks.
arXiv Detail & Related papers (2021-09-06T10:06:48Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z) - Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping [71.214923471669]
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL)
In this paper, we consider the problem of adaptively utilizing a given shaping reward function.
Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards.
arXiv Detail & Related papers (2020-11-05T05:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.