Optimal Exploration is no harder than Thompson Sampling
- URL: http://arxiv.org/abs/2310.06069v2
- Date: Tue, 24 Oct 2023 20:13:58 GMT
- Title: Optimal Exploration is no harder than Thompson Sampling
- Authors: Zhaoqi Li, Kevin Jamieson, Lalit Jain
- Abstract summary: A pure exploration linear bandit problem aims to return $argmax_zin mathcalZ ztoptheta_ast with high probability through noisy measurements of $xtoptheta_ast with $xin mathcalXsubset mathbbRd$.
This complexity is at odds with the popular and simple Thompson Sampling for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $mathcalZ$
- Score: 14.726673043806391
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a set of arms $\mathcal{Z}\subset \mathbb{R}^d$ and an unknown
parameter vector $\theta_\ast\in\mathbb{R}^d$, the pure exploration linear
bandit problem aims to return $\arg\max_{z\in \mathcal{Z}}
z^{\top}\theta_{\ast}$, with high probability through noisy measurements of
$x^{\top}\theta_{\ast}$ with $x\in \mathcal{X}\subset \mathbb{R}^d$. Existing
(asymptotically) optimal methods require either a) potentially costly
projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a
subset of $\mathcal{Z}$ under consideration at each time. This complexity is at
odds with the popular and simple Thompson Sampling algorithm for regret
minimization, which just requires access to a posterior sampling and argmax
oracle, and does not need to enumerate $\mathcal{Z}$ at any point.
Unfortunately, Thompson sampling is known to be sub-optimal for pure
exploration. In this work, we pose a natural question: is there an algorithm
that can explore optimally and only needs the same computational primitives as
Thompson Sampling? We answer the question in the affirmative. We provide an
algorithm that leverages only sampling and argmax oracles and achieves an
exponential convergence rate, with the exponent being the optimal among all
possible allocations asymptotically. In addition, we show that our algorithm
can be easily implemented and performs as well empirically as existing
asymptotically optimal methods.
Related papers
- Dueling Optimization with a Monotone Adversary [35.850072415395395]
We study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.
The goal is to design an online algorithm to find a minimizer $mathbfx*$ for a function $fcolon X to mathbbRd.
arXiv Detail & Related papers (2023-11-18T23:55:59Z) - Thompson Sampling for Real-Valued Combinatorial Pure Exploration of
Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration of the multi-armed bandit (R-CPE-MAB) problem.
We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in $d$.
arXiv Detail & Related papers (2023-08-20T11:56:02Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z) - The Fine-Grained Hardness of Sparse Linear Regression [12.83354999540079]
We show that there are no better-than-brute-force algorithms for the problem.
We also show the impossibility of better-than-brute-force algorithms when the prediction error is measured.
arXiv Detail & Related papers (2021-06-06T14:19:43Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - MOTS: Minimax Optimal Thompson Sampling [89.2370817955411]
It has remained an open problem whether Thompson sampling can match the minimax lower bound $Omega(sqrtKT)$ for $K$-armed bandit problems.
We propose a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step.
We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(sqrtKT)$ for finite time horizon $T$, as well as the optimal regret bound for Gaussian rewards when $T$ approaches infinity.
arXiv Detail & Related papers (2020-03-03T21:24:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.