Generalized Regret Analysis of Thompson Sampling using Fractional
Posteriors
- URL: http://arxiv.org/abs/2309.06349v1
- Date: Tue, 12 Sep 2023 16:15:33 GMT
- Title: Generalized Regret Analysis of Thompson Sampling using Fractional
Posteriors
- Authors: Prateek Jaiswal, Debdeep Pati, Anirban Bhattacharya, Bani K. Mallick
- Abstract summary: Thompson sampling (TS) is one of the most popular and earliest algorithms to solve multi-armed bandit problems.
We consider a variant of TS, named $alpha$-TS, where we use a fractional or $alpha$-posterior instead of the standard posterior distribution.
- Score: 12.43000662545423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Thompson sampling (TS) is one of the most popular and earliest algorithms to
solve stochastic multi-armed bandit problems. We consider a variant of TS,
named $\alpha$-TS, where we use a fractional or $\alpha$-posterior
($\alpha\in(0,1)$) instead of the standard posterior distribution. To compute
an $\alpha$-posterior, the likelihood in the definition of the standard
posterior is tempered with a factor $\alpha$. For $\alpha$-TS we obtain both
instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*}
\Delta_k\left(\frac{\log(T)}{C(\alpha)\Delta_k^2} + \frac{1}{2} \right)\right)$
and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret
bounds under very mild conditions on the prior and reward distributions, where
$\Delta_k$ is the gap between the true mean rewards of the $k^{th}$ and the
best arms, and $C(\alpha)$ is a known constant. Both the sub-Gaussian and
exponential family models satisfy our general conditions on the reward
distribution. Our conditions on the prior distribution just require its density
to be positive, continuous, and bounded. We also establish another
instance-dependent regret upper bound that matches (up to constants) to that of
improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines
recent theoretical developments in the non-asymptotic concentration analysis
and Bernstein-von Mises type results for the $\alpha$-posterior distribution.
Moreover, our analysis does not require additional structural properties such
as closed-form posteriors or conjugate priors.
Related papers
- Distribution of lowest eigenvalue in $k$-body bosonic random matrix ensembles [0.8999666725996978]
We numerically study the distribution of the lowest eigenvalue of finite many-boson systems with $k$-body interactions.
The first four moments of the distribution of lowest eigenvalues have been analyzed as a function of the $q$ parameter.
arXiv Detail & Related papers (2024-04-30T20:44:31Z) - Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions [11.222970035173372]
kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1 t-fracd+22(tfracd2 vee 1)right)
We show that a kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1/2 t-fracd4right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian
arXiv Detail & Related papers (2024-02-23T20:51:31Z) - Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise [45.60098988395789]
We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits.
We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
arXiv Detail & Related papers (2024-02-10T22:38:21Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.