Filtered Poisson Process Bandit on a Continuum
- URL: http://arxiv.org/abs/2007.09966v1
- Date: Mon, 20 Jul 2020 09:39:17 GMT
- Title: Filtered Poisson Process Bandit on a Continuum
- Authors: James A. Grant, and Roberto Szechtman
- Abstract summary: We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process.
We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space.
- Score: 1.4180331276028662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a version of the continuum armed bandit where an action induces a
filtered realisation of a non-homogeneous Poisson process. Point data in the
filtered sample are then revealed to the decision-maker, whose reward is the
total number of revealed points. Using knowledge of the function governing the
filtering, but without knowledge of the Poisson intensity function, the
decision-maker seeks to maximise the expected number of revealed points over T
rounds. We propose an upper confidence bound algorithm for this problem
utilising data-adaptive discretisation of the action space. This approach
enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function.
We provide lower bounds on the regret of any algorithm for the problem, via new
lower bounds for related finite-armed bandits, and show that the orders of the
upper and lower bounds match up to a logarithmic factor.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Convergent Bregman Plug-and-Play Image Restoration for Poisson Inverse
Problems [8.673558396669806]
Plug-noise-and-Play (Play) methods are efficient iterative algorithms for solving illposed image inverse problems.
We propose two.
algorithms based on the Bregman Score gradient Denoise inverse problems.
arXiv Detail & Related papers (2023-06-06T07:36:47Z) - On Dynamic Regret and Constraint Violations in Constrained Online Convex
Optimization [17.412117389855222]
We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action.
We show that both the dynamic regret and the constraint violation is order-wise bounded by the it path-length, the sum of the distances between the consecutive optimal actions.
arXiv Detail & Related papers (2023-01-24T04:22:13Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - On Limited-Memory Subsampling Strategies for Bandits [0.0]
We show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. ( 2020) is optimal in one-dimensional exponential families.
We also prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon.
We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes.
arXiv Detail & Related papers (2021-06-21T09:11:22Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding [42.669970064867556]
We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions.
We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
arXiv Detail & Related papers (2020-03-23T21:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.