Risk averse non-stationary multi-armed bandits
- URL: http://arxiv.org/abs/2109.13977v1
- Date: Tue, 28 Sep 2021 18:34:54 GMT
- Title: Risk averse non-stationary multi-armed bandits
- Authors: Leo Benac and Fr\'ed\'eric Godin
- Abstract summary: This paper tackles the risk averse multi-armed bandits problem when incurred losses are non-stationary.
Two estimation methods are proposed for this objective function in the presence of non-stationary losses.
Such estimates can then be embedded into classic arm selection methods such as epsilon-greedy policies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the risk averse multi-armed bandits problem when incurred
losses are non-stationary. The conditional value-at-risk (CVaR) is used as the
objective function. Two estimation methods are proposed for this objective
function in the presence of non-stationary losses, one relying on a weighted
empirical distribution of losses and another on the dual representation of the
CVaR. Such estimates can then be embedded into classic arm selection methods
such as epsilon-greedy policies. Simulation experiments assess the performance
of the arm selection algorithms based on the two novel estimation approaches,
and such policies are shown to outperform naive benchmarks not taking
non-stationarity into account.
Related papers
- Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively.
Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - On the Connection between $L_p$ and Risk Consistency and its
Implications on Regularized Kernel Methods [0.0]
The first aim of this paper is to establish the close connection between risk consistency and $L_p$-consistency for a considerably wider class of loss functions.
The attempt to transfer this connection to shifted loss functions surprisingly reveals that this shift does not reduce the assumptions needed on the underlying probability measure to the same extent as it does for many other results.
arXiv Detail & Related papers (2023-03-27T13:51:56Z) - Risk-aware linear bandits with convex loss [0.0]
We propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits.
This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent.
arXiv Detail & Related papers (2022-09-15T09:09:53Z) - Risk Consistent Multi-Class Learning from Label Proportions [64.0125322353281]
This study addresses a multiclass learning from label proportions (MCLLP) setting in which training instances are provided in bags.
Most existing MCLLP methods impose bag-wise constraints on the prediction of instances or assign them pseudo-labels.
A risk-consistent method is proposed for instance classification using the empirical risk minimization framework.
arXiv Detail & Related papers (2022-03-24T03:49:04Z) - Risk-Averse No-Regret Learning in Online Convex Games [19.4481913405231]
We consider an online game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs.
Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, the Conditional Value at Risk (CVaR) values of the costs are difficult to compute.
We propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values.
arXiv Detail & Related papers (2022-03-16T21:36:47Z) - Off-Policy Risk Assessment in Contextual Bandits [32.97618081988295]
We introduce the class of Lipschitz risk functionals, which subsumes many common functionals.
For Lipschitz risk functionals, the error in off-policy estimation is bounded by the error in off-policy estimation of the cumulative distribution function (CDF) of rewards.
We propose Off-Policy Risk Assessment (OPRA), an algorithm that estimates the target policy's CDF of rewards and generates a plug-in estimate of the risk.
arXiv Detail & Related papers (2021-04-18T23:27:40Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z) - Cautious Reinforcement Learning via Distributional Risk in the Dual
Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite.
We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.