Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative
Markov Games
- URL: http://arxiv.org/abs/2402.05906v1
- Date: Thu, 8 Feb 2024 18:43:27 GMT
- Title: Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative
Markov Games
- Authors: Hafez Ghaemi, Hamed Kebriaei, Alireza Ramezani Moghaddam, Majid Nili
Ahamdabadi
- Abstract summary: We propose a distributed sampling-based actor-critic (AC) algorithm with CPT risk for network aggregative games (NAMGs)
Under a set of assumptions, we prove to a subjective notion of perfect Nash equilibrium in NAMGs.
Experiments show that subjective policies can be different from the risk-neutral ones.
- Score: 2.85386288555414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical multi-agent reinforcement learning (MARL) assumes risk neutrality
and complete objectivity for agents. However, in settings where agents need to
consider or model human economic or social preferences, a notion of risk must
be incorporated into the RL optimization problem. This will be of greater
importance in MARL where other human or non-human agents are involved, possibly
with their own risk-sensitive policies. In this work, we consider
risk-sensitive and non-cooperative MARL with cumulative prospect theory (CPT),
a non-convex risk measure and a generalization of coherent measures of risk.
CPT is capable of explaining loss aversion in humans and their tendency to
overestimate/underestimate small/large probabilities. We propose a distributed
sampling-based actor-critic (AC) algorithm with CPT risk for network
aggregative Markov games (NAMGs), which we call Distributed Nested CPT-AC.
Under a set of assumptions, we prove the convergence of the algorithm to a
subjective notion of Markov perfect Nash equilibrium in NAMGs. The experimental
results show that subjective CPT policies obtained by our algorithm can be
different from the risk-neutral ones, and agents with a higher loss aversion
are more inclined to socially isolate themselves in an NAMG.
Related papers
- Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.
A notorious yet open challenge is if RMGs can escape the curse of multiagency.
This is the first algorithm to break the curse of multiagency for RMGs.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization [49.26510528455664]
We introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles.
We show that RiskQ can obtain promising performance through extensive experiments.
arXiv Detail & Related papers (2023-11-03T07:18:36Z) - Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk
Measures [10.221369785560785]
In this paper, we consider the problem of maximizing dynamic risk of a sequence of rewards in Markov Decision Processes (MDPs)
Using a convex combination of expectation and conditional value-at-risk (CVaR) as a special one-step conditional risk measure, we reformulate the risk-averse MDP as a risk-neutral counterpart with augmented action space and manipulation on the immediate rewards.
Our numerical studies show that the risk-averse setting can reduce the variance and enhance robustness of the results.
arXiv Detail & Related papers (2023-01-14T21:43:18Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - How does the Combined Risk Affect the Performance of Unsupervised Domain
Adaptation Approaches? [33.65954640678556]
Unsupervised domain adaptation (UDA) aims to train a target classifier with labeled samples from the source domain and unlabeled samples from the target domain.
E-MixNet employs enhanced mixup, a generic vicinal distribution, on the labeled source samples and pseudo-labeled target samples to calculate a proxy of the combined risk.
arXiv Detail & Related papers (2020-12-30T00:46:57Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse
Reward Learning with Iterative Reasoning and Cumulative Prospect Theory [33.57592649823294]
We investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward learning problem.
We show that humans have bounded intelligence and maximize risk-sensitive utilities in BRSMGs.
The results show that the behaviors of agents demonstrate both risk-averse and risk-seeking characteristics.
arXiv Detail & Related papers (2020-09-03T07:32:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.