RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2410.23569v1
- Date: Thu, 31 Oct 2024 02:25:43 GMT
- Title: RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning
- Authors: Yujie Zhao, Jose Efraim Aguilar Escamill, Weyl Lu, Huazheng Wang,
- Abstract summary: Preference-based Reinforcement Learning (PbRL) studies the problem where agents receive only preferences over pairs of trajectories in each episode.
Traditional risk-aware objectives and algorithms are not applicable in such one-episode-reward settings.
We introduce Risk-Aware- PbRL, an algorithm designed to optimize both nested and static objectives.
- Score: 7.407106653769627
- License:
- Abstract: Preference-based Reinforcement Learning (PbRL) studies the problem where agents receive only preferences over pairs of trajectories in each episode. Traditional approaches in this field have predominantly focused on the mean reward or utility criterion. However, in PbRL scenarios demanding heightened risk awareness, such as in AI systems, healthcare, and agriculture, risk-aware measures are requisite. Traditional risk-aware objectives and algorithms are not applicable in such one-episode-reward settings. To address this, we explore and prove the applicability of two risk-aware objectives to PbRL: nested and static quantile risk objectives. We also introduce Risk-Aware- PbRL (RA-PbRL), an algorithm designed to optimize both nested and static objectives. Additionally, we provide a theoretical analysis of the regret upper bounds, demonstrating that they are sublinear with respect to the number of episodes, and present empirical results to support our findings. Our code is available in https://github.com/aguilarjose11/PbRLNeurips.
Related papers
- Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics [0.7655800373514546]
Risk-aware Reinforcement Learning algorithms were shown to outperform risk-neutral counterparts in a variety of continuous-action tasks.
The theoretical basis for the pessimistic objectives these algorithms employ remains unestablished.
We propose Dual Actor-Critic (DAC) as a risk-aware, model-free algorithm that features two distinct actor networks.
arXiv Detail & Related papers (2023-10-30T13:28:06Z) - Risk-Aware Reinforcement Learning through Optimal Transport Theory [4.8951183832371]
This paper pioneers the integration of Optimal Transport theory with reinforcement learning (RL) to create a risk-aware framework.
Our approach modifies the objective function, ensuring that the resulting policy not only maximizes expected rewards but also respects risk constraints dictated by OT distances.
Our contributions are substantiated with a series of theorems, mapping the relationships between risk distributions, optimal value functions, and policy behaviors.
arXiv Detail & Related papers (2023-09-12T13:55:01Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - B-Pref: Benchmarking Preference-Based Reinforcement Learning [84.41494283081326]
We introduce B-Pref, a benchmark specially designed for preference-based RL.
A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly.
B-Pref alleviates this by simulating teachers with a wide array of irrationalities.
arXiv Detail & Related papers (2021-11-04T17:32:06Z) - Risk-Averse Offline Reinforcement Learning [46.383648750385575]
Training Reinforcement Learning (RL) agents in high-stakes applications might be too prohibitive due to the risk associated to exploration.
We present the Offline Risk-Averse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting.
arXiv Detail & Related papers (2021-02-10T10:27:49Z) - Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.