A Generalized Acquisition Function for Preference-based Reward Learning
- URL: http://arxiv.org/abs/2403.06003v1
- Date: Sat, 9 Mar 2024 20:32:17 GMT
- Title: A Generalized Acquisition Function for Preference-based Reward Learning
- Authors: Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca Dragan, Erdem
B{\i}y{\i}k
- Abstract summary: Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.
Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency.
We show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar.
- Score: 12.158619866176487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preference-based reward learning is a popular technique for teaching robots
and autonomous systems how a human user wants them to perform a task. Previous
works have shown that actively synthesizing preference queries to maximize
information gain about the reward function parameters improves data efficiency.
The information gain criterion focuses on precisely identifying all parameters
of the reward function. This can potentially be wasteful as many parameters may
result in the same reward, and many rewards may result in the same behavior in
the downstream tasks. Instead, we show that it is possible to optimize for
learning the reward function up to a behavioral equivalence class, such as
inducing the same ranking over behaviors, distribution over choices, or other
related definitions of what makes two rewards similar. We introduce a tractable
framework that can capture such definitions of similarity. Our experiments in a
synthetic environment, an assistive robotics environment with domain transfer,
and a natural language processing problem with real datasets demonstrate the
superior performance of our querying method over the state-of-the-art
information gain method.
Related papers
- Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - A Pattern Language for Machine Learning Tasks [0.0]
We view objective functions as constraints on the behaviour of learners.
We develop a formal graphical language that allows us to separate the core tasks of a behaviour from its implementation details.
As proof-of-concept, we design a novel task that enables converting classifiers into generative models we call "manipulators"
arXiv Detail & Related papers (2024-07-02T16:50:27Z) - Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input [17.131441665935128]
We study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models.
Our findings suggest that incorporating pragmatic feature preferences is a promising approach for more efficient user-aligned reward learning.
arXiv Detail & Related papers (2024-05-23T16:36:16Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Invariance in Policy Optimisation and Partial Identifiability in Reward
Learning [67.4640841144101]
We characterise the partial identifiability of the reward function given popular reward learning data sources.
We also analyse the impact of this partial identifiability for several downstream tasks, such as policy optimisation.
arXiv Detail & Related papers (2022-03-14T20:19:15Z) - Dynamics-Aware Comparison of Learned Reward Functions [21.159457412742356]
The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world.
Reward functions are typically compared by considering the behavior of optimized policies, but this approach conflates deficiencies in the reward function with those of the policy search algorithm used to optimize it.
We propose the Dynamics-Aware Reward Distance (DARD), a new reward pseudometric.
arXiv Detail & Related papers (2022-01-25T03:48:00Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Learning Reward Functions from Diverse Sources of Human Feedback:
Optimally Integrating Demonstrations and Preferences [14.683631546064932]
We present a framework to integrate multiple sources of information, which are either passively or actively collected from human users.
In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward.
Our approach accounts for the human's ability to provide data: yielding user-friendly preference queries which are also theoretically optimal.
arXiv Detail & Related papers (2020-06-24T22:45:27Z) - Active Preference-Based Gaussian Process Regression for Reward Learning [42.697198807877925]
One common approach is to learn reward functions from collected expert demonstrations.
We present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories.
Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework.
arXiv Detail & Related papers (2020-05-06T03:29:27Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.