Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2405.02724v1
- Date: Sat, 4 May 2024 17:47:45 GMT
- Title: Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning
- Authors: Yingjie Fei, Ruitu Xu,
- Abstract summary: We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games.
We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias.
We propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias.
- Score: 14.571671587217764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz
Dynamic Risk Measures [23.46659319363579]
We present two model-based algorithms applied to emphLipschitz dynamic risk measures.
Notably, our upper bounds demonstrate optimal dependencies on the number of actions and episodes.
arXiv Detail & Related papers (2023-06-04T16:24:19Z) - Regret Minimization and Convergence to Equilibria in General-sum Markov
Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z) - Learning Risk-Averse Equilibria in Multi-Agent Systems [13.25454171233235]
In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected.
We introduce a new risk-averse solution concept that allows the learner to accommodate unexpected actions.
We show that our population of agents that approximate a risk-averse equilibrium is particularly effective in the presence of unseen opposing populations.
arXiv Detail & Related papers (2022-05-30T21:20:30Z) - A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning
under Policy Uncertainty [6.471031681646443]
In games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types.
We propose risk-sensitive versions of existing algorithms for risk-neutral learning games.
Our experimental analysis shows that risk-sensitive DAPG performs better than competing algorithms for both social welfare and general-sum games.
arXiv Detail & Related papers (2022-03-18T16:40:30Z) - SENTINEL: Taming Uncertainty with Ensemble-based Distributional
Reinforcement Learning [6.587644069410234]
We consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL)
We introduce a novel quantification of risk, namely emphcomposite risk
We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate, demonstrates better risk-sensitive performance than competing RL algorithms.
arXiv Detail & Related papers (2021-02-22T14:45:39Z) - Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse
Reward Learning with Iterative Reasoning and Cumulative Prospect Theory [33.57592649823294]
We investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward learning problem.
We show that humans have bounded intelligence and maximize risk-sensitive utilities in BRSMGs.
The results show that the behaviors of agents demonstrate both risk-averse and risk-seeking characteristics.
arXiv Detail & Related papers (2020-09-03T07:32:32Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.