Asymptotic Randomised Control with applications to bandits
- URL: http://arxiv.org/abs/2010.07252v2
- Date: Sat, 3 Sep 2022 21:54:38 GMT
- Title: Asymptotic Randomised Control with applications to bandits
- Authors: Samuel N. Cohen and Tanut Treetanthiploet
- Abstract summary: We consider a general multi-armed bandit problem with correlated elements as a relaxed control problem.
By introducing an entropy regularisation, we obtain a smooth approximation to the value function.
This yields a novel semi-index approximation of the optimal decision process.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a general multi-armed bandit problem with correlated (and simple
contextual and restless) elements, as a relaxed control problem. By introducing
an entropy regularisation, we obtain a smooth asymptotic approximation to the
value function. This yields a novel semi-index approximation of the optimal
decision process. This semi-index can be interpreted as explicitly balancing an
exploration-exploitation trade-off as in the optimistic (UCB) principle where
the learning premium explicitly describes asymmetry of information available in
the environment and non-linearity in the reward function. Performance of the
resulting Asymptotic Randomised Control (ARC) algorithm compares favourably
well with other approaches to correlated multi-armed bandits.
Related papers
- Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Distributed Stochastic Nonconvex Optimization and Learning based on
Successive Convex Approximation [26.11677569331688]
We introduce a novel framework for the distributed algorithmic minimization of the sum of the sum of the agents in a network.
We show that the proposed method can be applied to distributed neural networks.
arXiv Detail & Related papers (2020-04-30T15:36:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.