Thompson Sampling for Gaussian Entropic Risk Bandits
- URL: http://arxiv.org/abs/2105.06960v1
- Date: Fri, 14 May 2021 17:01:02 GMT
- Title: Thompson Sampling for Gaussian Entropic Risk Bandits
- Authors: Ming Liang Ang, Eloise Y. Y. Lim, Joel Q. L. Chang
- Abstract summary: Risk complicates the basic reward-maximising objectives, in part because there is no universally agreed definition of it.
We consider an entropic risk (ER) measure and explore the performance of a Thompson sampling-based algorithm ERTS under this risk measure.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multi-armed bandit (MAB) problem is a ubiquitous decision-making problem
that exemplifies exploration-exploitation tradeoff. Standard formulations
exclude risk in decision making. Risknotably complicates the basic
reward-maximising objectives, in part because there is no universally agreed
definition of it. In this paper, we consider an entropic risk (ER) measure and
explore the performance of a Thompson sampling-based algorithm ERTS under this
risk measure by providing regret bounds for ERTS and corresponding instance
dependent lower bounds.
Related papers
- Risk-aware linear bandits with convex loss [0.0]
We propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits.
This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent.
arXiv Detail & Related papers (2022-09-15T09:09:53Z) - A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z) - Risk Consistent Multi-Class Learning from Label Proportions [64.0125322353281]
This study addresses a multiclass learning from label proportions (MCLLP) setting in which training instances are provided in bags.
Most existing MCLLP methods impose bag-wise constraints on the prediction of instances or assign them pseudo-labels.
A risk-consistent method is proposed for instance classification using the empirical risk minimization framework.
arXiv Detail & Related papers (2022-03-24T03:49:04Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Deep Bandits Show-Off: Simple and Efficient Exploration with Deep
Networks [14.178899938667161]
We introduce Sample Average Uncertainty (SAU), a simple and efficient uncertainty measure for contextual bandits.
Because of its simplicity SAU can be seamlessly applied to deep contextual bandits as a very scalable drop-in replacement for epsilon-greedy exploration.
arXiv Detail & Related papers (2021-05-10T21:45:01Z) - Off-Policy Evaluation of Slate Policies under Bayes Risk [70.10677881866047]
We study the problem of off-policy evaluation for slate bandits, for the typical case in which the logging policy factorizes over the slots of the slate.
We show that the risk improvement over PI grows linearly with the number of slots, and linearly with the gap between the arithmetic and the harmonic mean of a set of slot-level divergences.
arXiv Detail & Related papers (2021-01-05T20:07:56Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.