A Survey of Risk-Aware Multi-Armed Bandits
- URL: http://arxiv.org/abs/2205.05843v1
- Date: Thu, 12 May 2022 02:20:34 GMT
- Title: A Survey of Risk-Aware Multi-Armed Bandits
- Authors: Vincent Y. F. Tan and Prashanth L.A. and Krishna Jagannathan
- Abstract summary: We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
- Score: 84.67376599822569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In several applications such as clinical trials and financial portfolio
optimization, the expected value (or the average reward) does not
satisfactorily capture the merits of a drug or a portfolio. In such
applications, risk plays a crucial role, and a risk-aware performance measure
is preferable, so as to capture losses in the case of adverse events. This
survey aims to consolidate and summarise the existing research on risk
measures, specifically in the context of multi-armed bandits. We review various
risk measures of interest, and comment on their properties. Next, we review
existing concentration inequalities for various risk measures. Then, we proceed
to defining risk-aware bandit problems, We consider algorithms for the regret
minimization setting, where the exploration-exploitation trade-off manifests,
as well as the best-arm identification setting, which is a pure exploration
problem -- both in the context of risk-sensitive measures. We conclude by
commenting on persisting challenges and fertile areas for future research.
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - SafeAR: Safe Algorithmic Recourse by Risk-Aware Policies [2.291948092032746]
We present a method to compute recourse policies that consider variability in cost.
We show how existing recourse desiderata can fail to capture the risk of higher costs.
arXiv Detail & Related papers (2023-08-23T18:12:11Z) - Eliciting Risk Aversion with Inverse Reinforcement Learning via
Interactive Questioning [0.0]
This paper proposes a novel framework for identifying an agent's risk aversion using interactive questioning.
We prove that the agent's risk aversion can be identified as the number of questions tends to infinity, and the questions are randomly designed.
Our framework has important applications in robo-advising and provides a new approach for identifying an agent's risk preferences.
arXiv Detail & Related papers (2023-08-16T15:17:57Z) - Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz
Dynamic Risk Measures [23.46659319363579]
We present two model-based algorithms applied to emphLipschitz dynamic risk measures.
Notably, our upper bounds demonstrate optimal dependencies on the number of actions and episodes.
arXiv Detail & Related papers (2023-06-04T16:24:19Z) - Risk-aware linear bandits with convex loss [0.0]
We propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits.
This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent.
arXiv Detail & Related papers (2022-09-15T09:09:53Z) - Risk Perspective Exploration in Distributional Reinforcement Learning [10.441880303257468]
We present risk scheduling approaches that explore risk levels and optimistic behaviors from a risk perspective.
We demonstrate the performance enhancement of the DMIX algorithm using risk scheduling in a multi-agent setting.
arXiv Detail & Related papers (2022-06-28T17:37:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.