Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse
Reward Learning with Iterative Reasoning and Cumulative Prospect Theory
- URL: http://arxiv.org/abs/2009.01495v7
- Date: Sun, 21 Mar 2021 02:10:20 GMT
- Title: Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse
Reward Learning with Iterative Reasoning and Cumulative Prospect Theory
- Authors: Ran Tian, Liting Sun, and Masayoshi Tomizuka
- Abstract summary: We investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward learning problem.
We show that humans have bounded intelligence and maximize risk-sensitive utilities in BRSMGs.
The results show that the behaviors of agents demonstrate both risk-averse and risk-seeking characteristics.
- Score: 33.57592649823294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical game-theoretic approaches for multi-agent systems in both the
forward policy design problem and the inverse reward learning problem often
make strong rationality assumptions: agents perfectly maximize expected
utilities under uncertainties. Such assumptions, however, substantially
mismatch with observed humans' behaviors such as satisficing with sub-optimal,
risk-seeking, and loss-aversion decisions. In this paper, we investigate the
problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward
learning problem for modeling human realistic behaviors and learning human
behavioral models. Drawing on iterative reasoning models and cumulative
prospect theory, we embrace that humans have bounded intelligence and maximize
risk-sensitive utilities in BRSMGs. Convergence analysis for both the forward
policy design and the inverse reward learning problems are established under
the BRSMG framework. We validate the proposed forward policy design and inverse
reward learning algorithms in a navigation scenario. The results show that the
behaviors of agents demonstrate both risk-averse and risk-seeking
characteristics. Moreover, in the inverse reward learning task, the proposed
bounded risk-sensitive inverse learning algorithm outperforms a baseline
risk-neutral inverse learning algorithm by effectively recovering not only more
accurate reward values but also the intelligence levels and the risk-measure
parameters given demonstrations of agents' interactive behaviors.
Related papers
- Robust Reinforcement Learning with Dynamic Distortion Risk Measures [0.0]
We devise a framework to solve robust risk-aware reinforcement learning problems.
We simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures.
We construct an actor-critic algorithm to solve this class of robust risk-aware RL problems.
arXiv Detail & Related papers (2024-09-16T08:54:59Z) - Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning [14.571671587217764]
We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games.
We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias.
We propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias.
arXiv Detail & Related papers (2024-05-04T17:47:45Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics [0.7655800373514546]
Risk-aware Reinforcement Learning algorithms were shown to outperform risk-neutral counterparts in a variety of continuous-action tasks.
The theoretical basis for the pessimistic objectives these algorithms employ remains unestablished.
We propose Dual Actor-Critic (DAC) as a risk-aware, model-free algorithm that features two distinct actor networks.
arXiv Detail & Related papers (2023-10-30T13:28:06Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Reinforcement Learning with Dynamic Convex Risk Measures [0.0]
We develop an approach for solving time-consistent risk-sensitive optimization problems using model-free reinforcement learning (RL)
We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules.
arXiv Detail & Related papers (2021-12-26T16:41:05Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.