Invariant Lipschitz Bandits: A Side Observation Approach
- URL: http://arxiv.org/abs/2212.07524v3
- Date: Mon, 28 Aug 2023 10:06:41 GMT
- Title: Invariant Lipschitz Bandits: A Side Observation Approach
- Authors: Nam Phuong Tran, Long Tran-Thanh
- Abstract summary: We study the invariant Lipschitz bandit setting, where the reward function and the set of arms are preserved under a group of transformations.
We introduce an algorithm named textttUniformMesh-N, which naturally integrates side observations using group orbits.
We prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite.
- Score: 18.688474183114085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symmetry arises in many optimization and decision-making problems, and has
attracted considerable attention from the optimization community: By utilizing
the existence of such symmetries, the process of searching for optimal
solutions can be improved significantly. Despite its success in (offline)
optimization, the utilization of symmetries has not been well examined within
the online optimization settings, especially in the bandit literature. As such,
in this paper we study the invariant Lipschitz bandit setting, a subclass of
the Lipschitz bandits where the reward function and the set of arms are
preserved under a group of transformations. We introduce an algorithm named
\texttt{UniformMesh-N}, which naturally integrates side observations using
group orbits into the \texttt{UniformMesh} algorithm
(\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of
arms. Using the side-observation approach, we prove an improved regret upper
bound, which depends on the cardinality of the group, given that the group is
finite. We also prove a matching regret's lower bound for the invariant
Lipschitz bandit class (up to logarithmic factors). We hope that our work will
ignite further investigation of symmetry in bandit theory and sequential
decision-making theory in general.
Related papers
- Structured Regularization for Constrained Optimization on the SPD Manifold [1.1126342180866644]
We introduce a class of structured regularizers, based on symmetric gauge functions, which allow for solving constrained optimization on the SPD manifold with faster unconstrained methods.
We show that our structured regularizers can be chosen to preserve or induce desirable structure, in particular convexity and "difference of convex" structure.
arXiv Detail & Related papers (2024-10-12T22:11:22Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - A gradient estimator via L1-randomization for online zero-order
optimization with two point feedback [93.57603470949266]
We present a novel gradient estimator based on two function evaluation and randomization.
We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise.
We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem.
arXiv Detail & Related papers (2022-05-27T11:23:57Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Nonlinear matrix recovery using optimization on the Grassmann manifold [18.655422834567577]
We investigate the problem of recovering a partially observed high-rank clustering matrix whose columns obey a nonlinear structure such as a union of subspaces.
We show that the alternating limit converges to a unique point using the Kurdyka-Lojasi property.
arXiv Detail & Related papers (2021-09-13T16:13:13Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - The Last-Iterate Convergence Rate of Optimistic Mirror Descent in
Stochastic Variational Inequalities [29.0058976973771]
We show an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function.
We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained.
arXiv Detail & Related papers (2021-07-05T09:54:47Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - Optimistic Policy Optimization with Bandit Feedback [70.75568142146493]
We propose an optimistic trust region policy optimization (TRPO) algorithm for which we establish $tilde O(sqrtS2 A H4 K)$ regret for previous rewards.
To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
arXiv Detail & Related papers (2020-02-19T15:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.