Learning Risk-Averse Equilibria in Multi-Agent Systems
- URL: http://arxiv.org/abs/2205.15434v1
- Date: Mon, 30 May 2022 21:20:30 GMT
- Title: Learning Risk-Averse Equilibria in Multi-Agent Systems
- Authors: Oliver Slumbers, David Henry Mguni, Stephen McAleer, Jun Wang, Yaodong
Yang
- Abstract summary: In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected.
We introduce a new risk-averse solution concept that allows the learner to accommodate unexpected actions.
We show that our population of agents that approximate a risk-averse equilibrium is particularly effective in the presence of unseen opposing populations.
- Score: 13.25454171233235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent systems, intelligent agents are tasked with making decisions
that have optimal outcomes when the actions of the other agents are as
expected, whilst also being prepared for unexpected behaviour. In this work, we
introduce a new risk-averse solution concept that allows the learner to
accommodate unexpected actions by finding the minimum variance strategy given
any level of expected return. We prove the existence of such a risk-averse
equilibrium, and propose one fictitious-play type learning algorithm for
smaller games that enjoys provable convergence guarantees in certain games
classes (e.g., zero-sum or potential). Furthermore, we propose an approximation
method for larger games based on iterative population-based training that
generates a population of risk-averse agents. Empirically, our equilibrium is
shown to be able to reduce the reward variance, specifically in the sense that
off-equilibrium behaviour has a far smaller impact on our risk-averse agents in
comparison to playing other equilibrium solutions. Importantly, we show that
our population of agents that approximate a risk-averse equilibrium is
particularly effective in the presence of unseen opposing populations,
especially in the case of guaranteeing a minimal level of performance which is
critical to safety-aware multi-agent systems.
Related papers
- Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning [14.571671587217764]
We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games.
We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias.
We propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias.
arXiv Detail & Related papers (2024-05-04T17:47:45Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent
Reinforcement Learning [9.290757451344673]
We present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution.
Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression.
arXiv Detail & Related papers (2023-03-03T08:17:57Z) - Regret Minimization and Convergence to Equilibria in General-sum Markov
Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z) - Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning
under Policy Uncertainty [6.471031681646443]
In games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types.
We propose risk-sensitive versions of existing algorithms for risk-neutral learning games.
Our experimental analysis shows that risk-sensitive DAPG performs better than competing algorithms for both social welfare and general-sum games.
arXiv Detail & Related papers (2022-03-18T16:40:30Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Learning Collective Action under Risk Diversity [68.88688248278102]
We investigate the consequences of risk diversity in groups of agents learning to play collective risk dilemmas.
We show that risk diversity significantly reduces overall cooperation and hinders collective target achievement.
Our results highlight the need for aligning risk perceptions among agents or develop new learning techniques.
arXiv Detail & Related papers (2022-01-30T18:21:21Z) - On Assessing The Safety of Reinforcement Learning algorithms Using
Formal Methods [6.2822673562306655]
Safety mechanisms such as adversarial training, adversarial detection, and robust learning are not always adapted to all disturbances in which the agent is deployed.
It is therefore necessary to propose new solutions adapted to the learning challenges faced by the agent.
We use reward shaping and a modified Q-learning algorithm as defense mechanisms to improve the agent's policy when facing adversarial perturbations.
arXiv Detail & Related papers (2021-11-08T23:08:34Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in
Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps.
ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.