Related papers: Sparsity-Agnostic Lasso Bandit

Sparsity-Agnostic Lasso Bandit

URL: http://arxiv.org/abs/2007.08477v2
Date: Wed, 28 Apr 2021 06:04:34 GMT
Title: Sparsity-Agnostic Lasso Bandit
Authors: Min-hwan Oh, Garud Iyengar, Assaf Zeevi
Abstract summary: We consider a contextual bandit problem where the dimension $d$ of the feature vectors is potentially large. All existing algorithms for sparse bandits require a priori knowledge of the value of the sparsity index $s_$. We propose an algorithm that does not require prior knowledge of the sparsity index $s_$ and establish tight regret bounds on its performance under mild conditions.
Score: 27.383079108028074
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ affect the reward function. Essentially all existing algorithms for sparse bandits require a priori knowledge of the value of the sparsity index $s_0$. This knowledge is almost never available in practice, and misspecification of this parameter can lead to severe deterioration in the performance of existing methods. The main contribution of this paper is to propose an algorithm that does not require prior knowledge of the sparsity index $s_0$ and establish tight regret bounds on its performance under mild conditions. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms existing methods, even when the correct sparsity index is revealed to them but is kept hidden from our algorithm.

Related papers

Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions [8.48717433940334]
We introduce a new problem of generalized linear bandits with unknown reward functions, also known as single index bandits.<n>We first consider the case where the unknown reward function is monotonically increasing, and propose two novel and efficient algorithms, STOR and ESTOR.<n>We then extend our methods to the high-dimensional sparse setting and show that the same regret rate can be attained with the sparsity index.
arXiv Detail & Related papers (2025-06-15T07:19:00Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Linear Bandits with Partially Observable Features [35.08645010112184]
We introduce a novel linear bandit problem with partially observable features, resulting in partial reward information and spurious estimates. Without proper address for latent part, regret possibly grows linearly in decision horizon $T$, as their influence on rewards are unknown. We propose a novel analysis to handle the latent features and an algorithm that achieves sublinear regret.
arXiv Detail & Related papers (2025-02-10T04:15:38Z)
Lasso Bandit with Compatibility Condition on Optimal Arm [10.216425987201333]
We consider a sparse linear bandit problem where only a sparse subset of context features affects the expected reward function. We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(textpolylog dT)$ regret.
arXiv Detail & Related papers (2024-06-02T18:11:47Z)
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Stochastic Direct Search Method for Blind Resource Allocation [6.574808513848414]
We study direct search (also known as pattern search) methods for linearly constrained and derivative-free optimization. We show that direct search methods achieves finite regret in the deterministic and unconstrained case. We propose a simple extension of direct search that achieves a regret upper-bound of the order of $T2/3$.
arXiv Detail & Related papers (2022-10-11T07:40:45Z)
Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS) We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z)
Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits. We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z)
Lenient Regret and Good-Action Identification in Gaussian Process Bandits [43.03669155559218]
We study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough" On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold.
arXiv Detail & Related papers (2021-02-11T01:16:58Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support. We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
Entropy Regularized Power k-Means Clustering [21.013169939337583]
We propose a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees. Our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both.
arXiv Detail & Related papers (2020-01-10T14:05:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.