Improved Regret of Linear Ensemble Sampling
- URL: http://arxiv.org/abs/2411.03932v1
- Date: Wed, 06 Nov 2024 14:09:11 GMT
- Title: Improved Regret of Linear Ensemble Sampling
- Authors: Harin Lee, Min-hwan Oh,
- Abstract summary: We prove that with an ensemble size logarithmic in $T$, linear ensemble sampling can achieve a frequentist regret bound of $tildemathcalO(d3/2sqrtT)$.
Our contributions advance the theoretical foundation of ensemble sampling, bringing its regret bounds in line with the best known bounds for other randomized exploration algorithms.
- Score: 9.410437324336275
- License:
- Abstract: In this work, we close the fundamental gap of theory and practice by providing an improved regret bound for linear ensemble sampling. We prove that with an ensemble size logarithmic in $T$, linear ensemble sampling can achieve a frequentist regret bound of $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$, matching state-of-the-art results for randomized linear bandit algorithms, where $d$ and $T$ are the dimension of the parameter and the time horizon respectively. Our approach introduces a general regret analysis framework for linear bandit algorithms. Additionally, we reveal a significant relationship between linear ensemble sampling and Linear Perturbed-History Exploration (LinPHE), showing that LinPHE is a special case of linear ensemble sampling when the ensemble size equals $T$. This insight allows us to derive a new regret bound of $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ for LinPHE, independent of the number of arms. Our contributions advance the theoretical foundation of ensemble sampling, bringing its regret bounds in line with the best known bounds for other randomized exploration algorithms.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Optimal Thresholding Linear Bandit [21.2523248114561]
We study a novel pure exploration problem: the $epsilon$-Thresholding Bandit Problem (TBP) with fixed confidence in linear bandits.
We extend an algorithm designed for Best Arm Identification in the case to TBP that is complexityally optimal.
arXiv Detail & Related papers (2024-02-11T21:25:14Z) - A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization [6.314057999212246]
Random reshuffling techniques are used in large-scale applications, such as neural networks.
In this paper, we show that the random reshuffling-type iterations generated by norm-PRR converge to a linear setting.
Finally, we derive last convergence rates that can be applied to the proposed approach.
arXiv Detail & Related papers (2023-12-02T07:12:00Z) - Ensemble sampling for linear bandits: small ensembles suffice [75.4286948336835]
We provide the first useful and rigorous analysis of ensemble sampling for the linear bandit setting.
Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- while obtaining near $smashsqrtT$ order regret.
arXiv Detail & Related papers (2023-11-14T18:41:28Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Optimal Scalarizations for Sublinear Hypervolume Regret [2.703970154550017]
We show that hypervolume scalarizations with uniformly random weights achieve an optimal subvolume hypervolume regret bound of $O(T-1/k)$.
For the setting of multiobjective linear bandits, we derive a novel non-Euclidean analysis to get regret bounds of $tildeO( d T-1/2 + T-1/k)$, removing unnecessary $textpoly(k)$ dependencies.
arXiv Detail & Related papers (2023-07-06T20:49:42Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.