On The Fragility of Learned Reward Functions
- URL: http://arxiv.org/abs/2301.03652v1
- Date: Mon, 9 Jan 2023 19:45:38 GMT
- Title: On The Fragility of Learned Reward Functions
- Authors: Lev McKinney, Yawen Duan, David Krueger, Adam Gleave
- Abstract summary: We study the causes of relearning failures in the domain of preference-based reward learning.
Based on our findings, we emphasize the need for more retraining-based evaluations in the literature.
- Score: 4.826574398803286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward functions are notoriously difficult to specify, especially for tasks
with complex goals. Reward learning approaches attempt to infer reward
functions from human feedback and preferences. Prior works on reward learning
have mainly focused on the performance of policies trained alongside the reward
function. This practice, however, may fail to detect learned rewards that are
not capable of training new policies from scratch and thus do not capture the
intended behavior. Our work focuses on demonstrating and studying the causes of
these relearning failures in the domain of preference-based reward learning. We
demonstrate with experiments in tabular and continuous control environments
that the severity of relearning failures can be sensitive to changes in reward
model design and the trajectory dataset composition. Based on our findings, we
emphasize the need for more retraining-based evaluations in the literature.
Related papers
- Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions [5.78463306498655]
Reinforcement learning (RL) shows promise in control problems, but its practical application is often hindered by the complexity arising from intricate reward functions with constraints.
We propose a novel two-stage reward curriculum combined with a flexible replay buffer that adaptively samples experiences.
Our approach first learns on a subset of rewards before transitioning to the full reward, allowing the agent to learn trade-offs between objectives and constraints.
arXiv Detail & Related papers (2024-10-22T08:07:44Z) - Behavior Alignment via Reward Function Optimization [23.92721220310242]
We introduce a new framework that integrates auxiliary rewards reflecting a designer's domain knowledge with the environment's primary rewards.
We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges.
arXiv Detail & Related papers (2023-10-29T13:45:07Z) - Iterative Reward Shaping using Human Feedback for Correcting Reward
Misspecification [15.453123084827089]
ITERS is an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function.
We evaluate ITERS in three environments and show that it can successfully correct misspecified reward functions.
arXiv Detail & Related papers (2023-08-30T11:45:40Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Imitating Past Successes can be Very Suboptimal [145.70788608016755]
We show that existing outcome-conditioned imitation learning methods do not necessarily improve the policy.
We show that a simple modification results in a method that does guarantee policy improvement.
Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.
arXiv Detail & Related papers (2022-06-07T15:13:43Z) - Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble [8.857776147129464]
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning.
We present a dynamics-agnostic discriminator-ensemble reward learning method capable of learning both state-action and state-only reward functions.
arXiv Detail & Related papers (2022-06-01T05:16:39Z) - Causal Confusion and Reward Misidentification in Preference-Based Reward
Learning [33.944367978407904]
We study causal confusion and reward misidentification when learning from preferences.
We find that the presence of non-causal distractor features, noise in the stated preferences, and partial state observability can all exacerbate reward misidentification.
arXiv Detail & Related papers (2022-04-13T18:41:41Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.