Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed
Analysis
- URL: http://arxiv.org/abs/2002.11332v1
- Date: Wed, 26 Feb 2020 07:29:24 GMT
- Title: Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed
Analysis
- Authors: Vidyashankar Sivakumar, Zhiwei Steven Wu, Arindam Banerjee
- Abstract summary: We consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise.
We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.
- Score: 48.25118610209545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bandit learning algorithms typically involve the balance of exploration and
exploitation. However, in many practical applications, worst-case scenarios
needing systematic exploration are seldom encountered. In this work, we
consider a smoothed setting for structured linear contextual bandits where the
adversarial contexts are perturbed by Gaussian noise and the unknown parameter
$\theta^*$ has structure, e.g., sparsity, group sparsity, low rank, etc. We
propose simple greedy algorithms for both the single- and multi-parameter
(i.e., different parameter for each context) settings and provide a unified
regret analysis for $\theta^*$ with any assumed structure. The regret bounds
are expressed in terms of geometric quantities such as Gaussian widths
associated with the structure of $\theta^*$. We also obtain sharper regret
bounds compared to earlier work for the unstructured $\theta^*$ setting as a
consequence of our improved analysis. We show there is implicit exploration in
the smoothed setting where a simple greedy algorithm works.
Related papers
- Efficient Simple Regret Algorithms for Stochastic Contextual Bandits [32.5817931126341]
We study contextual logistic bandits under the simple regret objective.<n>We propose the first algorithm that achieves simple regret $tildemathcalO(d/sqrtT)$.<n>We also introduce a new variant of Thompson Sampling tailored to the simple-regret setting.
arXiv Detail & Related papers (2026-01-29T02:09:13Z) - An Equivalence Between Static and Dynamic Regret Minimization [10.812831455376218]
We show that for linear losses, dynamic regret minimization is equivalent to static regret minimization in an extended decision space.
We provide an algorithm guaranteeing dynamic regret of the form $R_T(u_1,dots,u_T)le tilde.
arXiv Detail & Related papers (2024-06-03T17:54:58Z) - Lasso Bandit with Compatibility Condition on Optimal Arm [10.216425987201333]
We consider a sparse linear bandit problem where only a sparse subset of context features affects the expected reward function.
We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(textpolylog dT)$ regret.
arXiv Detail & Related papers (2024-06-02T18:11:47Z) - Noise misleads rotation invariant algorithms on sparse targets [0.0]
We show that the class of rotation invariant algorithms are suboptimal for learning sparse linear problems.
We prove this via a lower bound for the Bayes optimal algorithm on a rotationally symmetrized problem.
We then prove much lower upper bounds on the same problem for simple non-rotation invariant algorithms.
arXiv Detail & Related papers (2024-03-05T06:25:19Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual
Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret.
Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z) - A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual
Bandit Problem [25.5073029104128]
We prove that the simple online Lasso supports sparse linear contextual bandit with regret bound $mathcalO(sqrtkTlog d)$.
Special structures in our results explicitly characterize how the perturbation affects exploration length.
arXiv Detail & Related papers (2020-07-16T18:48:45Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - GeoDA: a geometric framework for black-box adversarial attacks [79.52980486689287]
We propose a framework to generate adversarial examples in one of the most challenging black-box settings.
Our framework is based on the observation that the decision boundary of deep networks usually has a small mean curvature in the vicinity of data samples.
arXiv Detail & Related papers (2020-03-13T20:03:01Z) - Regret Minimization in Stochastic Contextual Dueling Bandits [40.17224226373741]
We consider the problem of $K$-armed dueling bandit in the contextual setting.
We present two algorithms for the setup with respective regret guarantees.
arXiv Detail & Related papers (2020-02-20T06:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.