B-Pref: Benchmarking Preference-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2111.03026v1
- Date: Thu, 4 Nov 2021 17:32:06 GMT
- Title: B-Pref: Benchmarking Preference-Based Reinforcement Learning
- Authors: Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel
- Abstract summary: We introduce B-Pref, a benchmark specially designed for preference-based RL.
A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly.
B-Pref alleviates this by simulating teachers with a wide array of irrationalities.
- Score: 84.41494283081326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) requires access to a reward function that
incentivizes the right behavior, but these are notoriously hard to specify for
complex tasks. Preference-based RL provides an alternative: learning policies
using a teacher's preferences without pre-defined rewards, thus overcoming
concerns associated with reward engineering. However, it is difficult to
quantify the progress in preference-based RL due to the lack of a commonly
adopted benchmark. In this paper, we introduce B-Pref: a benchmark specially
designed for preference-based RL. A key challenge with such a benchmark is
providing the ability to evaluate candidate algorithms quickly, which makes
relying on real human input for evaluation prohibitive. At the same time,
simulating human input as giving perfect preferences for the ground truth
reward function is unrealistic. B-Pref alleviates this by simulating teachers
with a wide array of irrationalities, and proposes metrics not solely for
performance but also for robustness to these potential irrationalities. We
showcase the utility of B-Pref by using it to analyze algorithmic design
choices, such as selecting informative queries, for state-of-the-art
preference-based RL algorithms. We hope that B-Pref can serve as a common
starting point to study preference-based RL more systematically. Source code is
available at https://github.com/rll-research/B-Pref.
Related papers
- Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences [23.414135977983953]
Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal.
We present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences.
arXiv Detail & Related papers (2024-02-27T07:03:25Z) - Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions.
CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Direct Preference-based Policy Optimization without Reward Modeling [25.230992130108767]
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference.
We propose a PbRL algorithm that directly learns from preference without requiring any reward modeling.
We show that our algorithm surpasses offline RL methods that learn with ground-truth reward information.
arXiv Detail & Related papers (2023-01-30T12:51:13Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.