Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
- URL: http://arxiv.org/abs/2107.04518v1
- Date: Fri, 9 Jul 2021 16:04:24 GMT
- Title: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
- Authors: Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei,
Runzhe Wang, Jiaqi Yang
- Abstract summary: This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
- Score: 76.57464214864756
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Bandit problems with linear or concave reward have been extensively studied,
but relatively few works have studied bandits with non-concave reward. This
work considers a large family of bandit problems where the unknown underlying
reward function is non-concave, including the low-rank generalized linear
bandit problems and two-layer neural network with polynomial activation bandit
problem. For the low-rank generalized linear bandit problem, we provide a
minimax-optimal algorithm in the dimension, refuting both conjectures in
[LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order
optimization paradigm that applies in great generality and attains optimal
rates in several structured polynomial settings (in the dimension). We further
demonstrate the applicability of our algorithms in RL in the generative model
setting, resulting in improved sample complexity over prior approaches.
Finally, we show that the standard optimistic algorithms (e.g., UCB) are
sub-optimal by dimension factors. In the neural net setting (with polynomial
activation functions) with noiseless reward, we provide a bandit algorithm with
sample complexity equal to the intrinsic algebraic dimension. Again, we show
that optimistic approaches have worse sample complexity, polynomial in the
extrinsic dimension (which could be exponentially worse in the polynomial
degree).
Related papers
- Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Second Order Methods for Bandit Optimization and Control [34.51425758864638]
We show that our algorithm achieves optimal (in terms of terms of convex functions that we call $kappa$-2020) regret bounds for a large class of convex functions.
We also investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory.
arXiv Detail & Related papers (2024-02-14T04:03:38Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - A Simple Unified Framework for High Dimensional Bandit Problems [33.139925285802825]
We present a general analysis framework for the regret upper bound of our algorithm.
We show that our algorithm can be applied to different high dimensional bandit problems.
arXiv Detail & Related papers (2021-02-18T21:35:32Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.