A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed
Bandits
- URL: http://arxiv.org/abs/2310.19821v1
- Date: Tue, 24 Oct 2023 19:29:13 GMT
- Title: A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed
Bandits
- Authors: Reda Alami, Mohammed Mahfoud, Mastane Achab
- Abstract summary: In areas of high volatility like healthcare or finance, a naive reward approach often does not accurately capture the complexity of the learning problem.
We propose a framework of adaptive risk-aware strategies that operate in non-stationary environments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a typical stochastic multi-armed bandit problem, the objective is often to
maximize the expected sum of rewards over some time horizon $T$. While the
choice of a strategy that accomplishes that is optimal with no additional
information, it is no longer the case when provided additional
environment-specific knowledge. In particular, in areas of high volatility like
healthcare or finance, a naive reward maximization approach often does not
accurately capture the complexity of the learning problem and results in
unreliable solutions. To tackle problems of this nature, we propose a framework
of adaptive risk-aware strategies that operate in non-stationary environments.
Our framework incorporates various risk measures prevalent in the literature to
map multiple families of multi-armed bandit algorithms into a risk-sensitive
setting. In addition, we equip the resulting algorithms with the Restarted
Bayesian Online Change-Point Detection (R-BOCPD) algorithm and impose a
(tunable) forced exploration strategy to detect local (per-arm) switches. We
provide finite-time theoretical guarantees and an asymptotic regret bound of
order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ the total
number of change-points. In practice, our framework compares favorably to the
state-of-the-art in both synthetic and real-world environments and manages to
perform efficiently with respect to both risk-sensitivity and non-stationarity.
Related papers
- Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem [4.178382980763478]
In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms)
In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness.
We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index.
arXiv Detail & Related papers (2024-10-30T13:59:30Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks [142.67349734180445]
Existing algorithms that provide risk-awareness to deep neural networks are complex and ad-hoc.
Here we present capsa, a framework for extending models with risk-awareness.
arXiv Detail & Related papers (2023-08-01T02:07:47Z) - Stability-penalty-adaptive follow-the-regularized-leader: Sparsity,
game-dependency, and best-of-both-worlds [46.30750729936261]
Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems.
We establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW)
arXiv Detail & Related papers (2023-05-26T23:20:48Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - On the Convergence and Optimality of Policy Gradient for Markov Coherent
Risk [32.97618081988295]
We present a tight upper bound on the suboptimality of the learned policy, characterizing its dependence on the nonlinearity of the objective and the degree of risk aversion.
We propose a practical implementation of PG that uses state distribution reweighting to overcome previous limitations.
arXiv Detail & Related papers (2021-03-04T04:11:09Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.