Does Sparsity Help in Learning Misspecified Linear Bandits?
- URL: http://arxiv.org/abs/2303.16998v1
- Date: Wed, 29 Mar 2023 19:58:39 GMT
- Title: Does Sparsity Help in Learning Misspecified Linear Bandits?
- Authors: Jialin Dong and Lin F. Yang
- Abstract summary: We show that algorithms can obtain $O(varepsilon)$-optimal actions by querying $O(varepsilon-sds)$ actions.
We also show that our upper bound on sample complexity is nearly tight if one demands an error $ O(sdeltavarepsilon)$ for $0delta1$.
- Score: 32.920577630673804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the study of linear misspecified bandits has generated intriguing
implications of the hardness of learning in bandits and reinforcement learning
(RL). In particular, Du et al. (2020) show that even if a learner is given
linear features in $\mathbb{R}^d$ that approximate the rewards in a bandit or
RL with a uniform error of $\varepsilon$, searching for an
$O(\varepsilon)$-optimal action requires pulling at least $\Omega(\exp(d))$
queries. Furthermore, Lattimore et al. (2020) show that a degraded
$O(\varepsilon\sqrt{d})$-optimal solution can be learned within
$\operatorname{poly}(d/\varepsilon)$ queries. Yet it is unknown whether a
structural assumption on the ground-truth parameter, such as sparsity, could
break the $\varepsilon\sqrt{d}$ barrier. In this paper, we address this
question by showing that algorithms can obtain $O(\varepsilon)$-optimal actions
by querying $O(\varepsilon^{-s}d^s)$ actions, where $s$ is the sparsity
parameter, removing the $\exp(d)$-dependence. We then establish
information-theoretical lower bounds, i.e., $\Omega(\exp(s))$, to show that our
upper bound on sample complexity is nearly tight if one demands an error $
O(s^{\delta}\varepsilon)$ for $0<\delta<1$. For $\delta\geq 1$, we further show
that $\operatorname{poly}(s/\varepsilon)$ queries are possible when the linear
features are "good" and even in general settings. These results provide a
nearly complete picture of how sparsity can help in misspecified bandit
learning and provide a deeper understanding of when linear features are
"useful" for bandit and reinforcement learning with misspecification.
Related papers
- Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error [25.777423855881878]
We show a novel elimination-based algorithm to show one can obtain an $Oleft(Hepsilonright)$-optimal policy.
We complement our upper bound with an $widetildeOmegaleft(Hepsilonright)$-optimality lower bound, giving a complete picture of this problem.
arXiv Detail & Related papers (2024-07-18T15:58:04Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Coresets for Multiple $\ell_p$ Regression [47.790126028106734]
We construct coresets of size $tilde O(varepsilon-2d)$ for $p2$ and $tilde O(varepsilon-pdp/2)$ for $p>2$.
For $1p2$, every matrix has a subset of $tilde O(varepsilon-1k)$ rows which spans a $(varepsilon-1k)$-approximately optimal $k$-dimensional subspace for $ell_p$ subspace approximation
arXiv Detail & Related papers (2024-06-04T15:50:42Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - On Optimal Learning Under Targeted Data Poisoning [48.907813854832206]
In this work we aim to characterize the smallest achievable error $epsilon=epsilon(eta)$ by the learner in the presence of such an adversary.
Remarkably, we show that the upper bound can be attained by a deterministic learner.
arXiv Detail & Related papers (2022-10-06T06:49:48Z) - Tractability from overparametrization: The example of the negative
perceptron [9.077560551700163]
We analyze a linear programming algorithm to characterize the corresponding threshold threshold $delta_textlin(kappa)$.
We observe a gap between the threshold $delta_textlin(kappa)$, raising question of the behavior of other algorithms.
arXiv Detail & Related papers (2021-10-28T01:00:13Z) - The Price of Tolerance in Distribution Testing [31.10049510641336]
We show the sample complexity to be [fracsqrtnvarepsilon2 + fracnlog n cdotmaxleftfracvarepsilon2, providing a smooth tradeoff between the two previously known cases.
We also provide a similar characterization for the problem of tolerant equivalence testing, where both $p$ and $q$ are unknown.
arXiv Detail & Related papers (2021-06-25T03:59:42Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed
Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments.
We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.