A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual
Bandit Problem
- URL: http://arxiv.org/abs/2007.08561v1
- Date: Thu, 16 Jul 2020 18:48:45 GMT
- Title: A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual
Bandit Problem
- Authors: Zhiyuan Liu, Huazheng Wang, Bo Waggoner, Youjian (Eugene) Liu, Lijun
Chen
- Abstract summary: We prove that the simple online Lasso supports sparse linear contextual bandit with regret bound $mathcalO(sqrtkTlog d)$.
Special structures in our results explicitly characterize how the perturbation affects exploration length.
- Score: 25.5073029104128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the sparse linear contextual bandit problem where the
parameter $\theta$ is sparse. To relieve the sampling inefficiency, we utilize
the "perturbed adversary" where the context is generated adversarilly but with
small random non-adaptive perturbations. We prove that the simple online Lasso
supports sparse linear contextual bandit with regret bound
$\mathcal{O}(\sqrt{kT\log d})$ even when $d \gg T$ where $k$ and $d$ are the
number of effective and ambient dimension, respectively. Compared to the recent
work from Sivakumar et al. (2020), our analysis does not rely on the
precondition processing, adaptive perturbation (the adaptive perturbation
violates the i.i.d perturbation setting) or truncation on the error set.
Moreover, the special structures in our results explicitly characterize how the
perturbation affects exploration length, guide the design of perturbation
together with the fundamental performance limit of perturbation method.
Numerical experiments are provided to complement the theoretical analysis.
Related papers
- Lasso Bandit with Compatibility Condition on Optimal Arm [10.216425987201333]
We consider a sparse linear bandit problem where only a sparse subset of context features affects the expected reward function.
We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(textpolylog dT)$ regret.
arXiv Detail & Related papers (2024-06-02T18:11:47Z) - FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits [8.908421753758475]
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems.
Motivated by data privacy concerns, we study the joint differentially private high dimensional sparse linear bandits.
We show that FLIPHAT achieves optimal regret in terms of privacy parameters.
arXiv Detail & Related papers (2024-05-22T22:19:12Z) - Exploration via linearly perturbed loss minimisation [4.856378369489158]
We introduce exploration via linear loss perturbations (EVILL) for structured bandit problems.
We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards.
We propose data-dependent perturbations not present in previous PHE-type methods that allow EVILL to match the performance of Thompson-sampling-style parameter-perturbation methods.
arXiv Detail & Related papers (2023-11-13T18:54:43Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - LSDAT: Low-Rank and Sparse Decomposition for Decision-based Adversarial
Attack [74.5144793386864]
LSDAT crafts perturbations in the low-dimensional subspace formed by the sparse component of the input sample and that of an adversarial sample.
LSD works directly in the image pixel domain to guarantee that non-$ell$ constraints, such as sparsity, are satisfied.
arXiv Detail & Related papers (2021-03-19T13:10:47Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.