Achieving Fairness in Multi-Agent Markov Decision Processes Using
Reinforcement Learning
- URL: http://arxiv.org/abs/2306.00324v1
- Date: Thu, 1 Jun 2023 03:43:53 GMT
- Title: Achieving Fairness in Multi-Agent Markov Decision Processes Using
Reinforcement Learning
- Authors: Peizhong Ju, Arnob Ghosh, Ness B. Shroff
- Abstract summary: We propose a Reinforcement Learning approach to achieve fairness in finite-horizon episodic MDPs.
We show that such an approach achieves sub-linear regret in terms of the number of episodes.
- Score: 30.605881670761853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fairness plays a crucial role in various multi-agent systems (e.g.,
communication networks, financial markets, etc.). Many multi-agent dynamical
interactions can be cast as Markov Decision Processes (MDPs). While existing
research has focused on studying fairness in known environments, the
exploration of fairness in such systems for unknown environments remains open.
In this paper, we propose a Reinforcement Learning (RL) approach to achieve
fairness in multi-agent finite-horizon episodic MDPs. Instead of maximizing the
sum of individual agents' value functions, we introduce a fairness function
that ensures equitable rewards across agents. Since the classical Bellman's
equation does not hold when the sum of individual value functions is not
maximized, we cannot use traditional approaches. Instead, in order to explore,
we maintain a confidence bound of the unknown environment and then propose an
online convex optimization based approach to obtain a policy constrained to
this confidence region. We show that such an approach achieves sub-linear
regret in terms of the number of episodes. Additionally, we provide a probably
approximately correct (PAC) guarantee based on the obtained regret bound. We
also propose an offline RL algorithm and bound the optimality gap with respect
to the optimal fair solution. To mitigate computational complexity, we
introduce a policy-gradient type method for the fair objective. Simulation
experiments also demonstrate the efficacy of our approach.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent
Reinforcement Learning [9.290757451344673]
We present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution.
Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression.
arXiv Detail & Related papers (2023-03-03T08:17:57Z) - Latent State Marginalization as a Low-cost Approach for Improving
Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework.
We show that latent variable policies naturally emerges under the use of world models with a latent belief state.
We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Convergence Rates of Average-Reward Multi-agent Reinforcement Learning
via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability.
We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging.
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.