Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning
with Average and Discounted Rewards
- URL: http://arxiv.org/abs/2008.07773v1
- Date: Tue, 18 Aug 2020 07:17:53 GMT
- Title: Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning
with Average and Discounted Rewards
- Authors: Umer Siddique, Paul Weng, Matthieu Zimmer
- Abstract summary: We investigate the problem of learning a policy that treats its users equitably.
In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness, is optimized.
We describe how several classic deep RL algorithms can be adapted to our fair optimization problem.
- Score: 15.082715993594121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the operations of autonomous systems generally affect simultaneously
several users, it is crucial that their designs account for fairness
considerations. In contrast to standard (deep) reinforcement learning (RL), we
investigate the problem of learning a policy that treats its users equitably.
In this paper, we formulate this novel RL problem, in which an objective
function, which encodes a notion of fairness that we formally define, is
optimized. For this problem, we provide a theoretical discussion where we
examine the case of discounted rewards and that of average rewards. During this
analysis, we notably derive a new result in the standard RL setting, which is
of independent interest: it states a novel bound on the approximation error
with respect to the optimal average reward of that of a policy optimal for the
discounted reward. Since learning with discounted rewards is generally easier,
this discussion further justifies finding a fair policy for the average reward
by learning a fair policy for the discounted reward. Thus, we describe how
several classic deep RL algorithms can be adapted to our fair optimization
problem, and we validate our approach with extensive experiments in three
different domains.
Related papers
- Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO)
Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses.
With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z) - Fine-Tuning Language Models with Reward Learning on Policy [68.70065254564642]
Reinforcement learning from human feedback (RLHF) has emerged as an effective approach to aligning large language models (LLMs) to human preferences.
Despite its popularity, (fixed) reward models may suffer from inaccurate off-distribution.
We propose reward learning on policy (RLP), an unsupervised framework that refines a reward model using policy samples to keep it on-distribution.
arXiv Detail & Related papers (2024-03-28T10:02:10Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - Fairness in Preference-based Reinforcement Learning [2.3388338598125196]
We design a new fairness-induced preference-based reinforcement learning or FPbRL.
The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences.
Experiment studies show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
arXiv Detail & Related papers (2023-06-16T17:47:36Z) - Achieving Fairness in Multi-Agent Markov Decision Processes Using
Reinforcement Learning [30.605881670761853]
We propose a Reinforcement Learning approach to achieve fairness in finite-horizon episodic MDPs.
We show that such an approach achieves sub-linear regret in terms of the number of episodes.
arXiv Detail & Related papers (2023-06-01T03:43:53Z) - Internally Rewarded Reinforcement Learning [22.01249652558878]
We study a class of reinforcement learning problems where the reward signals for policy learning are generated by an internal reward model.
We show that the proposed reward function can consistently stabilize the training process by reducing the impact of reward noise.
arXiv Detail & Related papers (2023-02-01T06:25:46Z) - Examining average and discounted reward optimality criteria in
reinforcement learning [4.873362301533825]
Two major optimality criteria are average and discounted rewards, where the later is typically considered as an approximation to the former.
While the discounted reward is more popular, it is problematic to apply in environments that have no natural notion of discounting.
Our contributions include a thorough examination of the relationship between average and discounted rewards, as well as a discussion of their pros and cons in RL.
arXiv Detail & Related papers (2021-07-03T05:28:56Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Variational Policy Gradient Method for Reinforcement Learning with
General Utilities [38.54243339632217]
In recent years, reinforcement learning systems with general goals beyond a cumulative sum of rewards have gained traction.
In this paper, we consider policy in Decision Problems, where the objective converges a general concave utility function.
We derive a new Variational Policy Gradient Theorem for RL with general utilities.
arXiv Detail & Related papers (2020-07-04T17:51:53Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.