Near-Optimal MNL Bandits Under Risk Criteria
- URL: http://arxiv.org/abs/2009.12511v3
- Date: Tue, 16 Mar 2021 02:22:23 GMT
- Title: Near-Optimal MNL Bandits Under Risk Criteria
- Authors: Guangyu Xi, Chao Tao and Yuan Zhou
- Abstract summary: We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria.
We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret.
- Score: 13.251377915797674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study MNL bandits, which is a variant of the traditional multi-armed
bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk
criteria are more general goals widely used in industries and bussiness. We
design algorithms for a broad class of risk criteria, including but not limited
to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and
prove that they suffer a near-optimal regret. As a complement, we also conduct
experiments with both synthetic and real data to show the empirical performance
of our proposed algorithms.
Related papers
- A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed
Bandits [0.0]
In areas of high volatility like healthcare or finance, a naive reward approach often does not accurately capture the complexity of the learning problem.
We propose a framework of adaptive risk-aware strategies that operate in non-stationary environments.
arXiv Detail & Related papers (2023-10-24T19:29:13Z) - A Distribution Optimization Framework for Confidence Bounds of Risk
Measures [23.46659319363579]
We present a distribution optimization framework that significantly improves confidence bounds for various risk measures compared to previous methods.
Our framework encompasses popular risk measures such as the entropic risk measure, conditional value at risk (CVaR), spectral risk measure, distortion risk measure, equivalent certainty, and rank-dependent expected utility.
arXiv Detail & Related papers (2023-06-12T12:13:06Z) - Conditionally Risk-Averse Contextual Bandits [8.894935073145252]
Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations.
We present the first risk-averse contextual bandit algorithm with an online regret guarantee.
We conduct experiments from diverse scenarios where worst-case outcomes should be avoided.
arXiv Detail & Related papers (2022-10-24T19:49:37Z) - Contextual bandits with concave rewards, and an application to fair
ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR)
We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem.
Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z) - A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.