Fairness in Preference-based Reinforcement Learning
- URL: http://arxiv.org/abs/2306.09995v2
- Date: Fri, 1 Sep 2023 05:04:28 GMT
- Title: Fairness in Preference-based Reinforcement Learning
- Authors: Umer Siddique, Abhinav Sinha, Yongcan Cao
- Abstract summary: We design a new fairness-induced preference-based reinforcement learning or FPbRL.
The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences.
Experiment studies show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
- Score: 2.3388338598125196
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we address the issue of fairness in preference-based
reinforcement learning (PbRL) in the presence of multiple objectives. The main
objective is to design control policies that can optimize multiple objectives
while treating each objective fairly. Toward this objective, we design a new
fairness-induced preference-based reinforcement learning or FPbRL. The main
idea of FPbRL is to learn vector reward functions associated with multiple
objectives via new welfare-based preferences rather than reward-based
preference in PbRL, coupled with policy learning via maximizing a generalized
Gini welfare function. Finally, we provide experiment studies on three
different environments to show that the proposed FPbRL approach can achieve
both efficiency and equity for learning effective and fair policies.
Related papers
- MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Utility-Based Reinforcement Learning: Unifying Single-objective and
Multi-objective Reinforcement Learning [3.292607871053364]
We extend the utility-based paradigm to the context of single-objective reinforcement learning (RL)
We outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL.
We also examine the algorithmic implications of adopting a utility-based approach.
arXiv Detail & Related papers (2024-02-05T01:42:28Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning
with Average and Discounted Rewards [15.082715993594121]
We investigate the problem of learning a policy that treats its users equitably.
In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness, is optimized.
We describe how several classic deep RL algorithms can be adapted to our fair optimization problem.
arXiv Detail & Related papers (2020-08-18T07:17:53Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.