Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning
under Policy Uncertainty
- URL: http://arxiv.org/abs/2203.10045v1
- Date: Fri, 18 Mar 2022 16:40:30 GMT
- Title: Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning
under Policy Uncertainty
- Authors: Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis
- Abstract summary: In games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types.
We propose risk-sensitive versions of existing algorithms for risk-neutral learning games.
Our experimental analysis shows that risk-sensitive DAPG performs better than competing algorithms for both social welfare and general-sum games.
- Score: 6.471031681646443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In stochastic games with incomplete information, the uncertainty is evoked by
the lack of knowledge about a player's own and the other players' types, i.e.
the utility function and the policy space, and also the inherent stochasticity
of different players' interactions. In existing literature, the risk in
stochastic games has been studied in terms of the inherent uncertainty evoked
by the variability of transitions and actions. In this work, we instead focus
on the risk associated with the \textit{uncertainty over types}. We contrast
this with the multi-agent reinforcement learning framework where the other
agents have fixed stationary policies and investigate risk-sensitiveness due to
the uncertainty about the other agents' adaptive policies. We propose
risk-sensitive versions of existing algorithms proposed for risk-neutral
stochastic games, such as Iterated Best Response (IBR), Fictitious Play (FP)
and a general multi-objective gradient approach using dual ascent (DAPG). Our
experimental analysis shows that risk-sensitive DAPG performs better than
competing algorithms for both social welfare and general-sum stochastic games.
Related papers
- Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning [14.571671587217764]
We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games.
We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias.
We propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias.
arXiv Detail & Related papers (2024-05-04T17:47:45Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based
Offline Reinforcement Learning [25.218430053391884]
We propose risk-sensitivity as a mechanism to jointly address both of these issues.
Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environmentity.
Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks.
arXiv Detail & Related papers (2022-11-30T21:24:11Z) - Learning Risk-Averse Equilibria in Multi-Agent Systems [13.25454171233235]
In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected.
We introduce a new risk-averse solution concept that allows the learner to accommodate unexpected actions.
We show that our population of agents that approximate a risk-averse equilibrium is particularly effective in the presence of unseen opposing populations.
arXiv Detail & Related papers (2022-05-30T21:20:30Z) - A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - SAAC: Safe Reinforcement Learning as an Adversarial Game of
Actor-Critics [11.132587007566329]
We develop a safe adversarially guided soft actor-critic framework called SAAC.
In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function.
We show that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints.
arXiv Detail & Related papers (2022-04-20T12:32:33Z) - Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments.
We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.