Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit
- URL: http://arxiv.org/abs/2411.12878v1
- Date: Tue, 19 Nov 2024 21:53:06 GMT
- Title: Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit
- Authors: Seok-Jin Kim, Min-hwan Oh,
- Abstract summary: We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem.
We introduce a novel condition, named the textitLocal Anti-Concentration (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency.
- Score: 8.087699764574788
- License:
- Abstract: We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem. We introduce a novel condition, named the \textit{Local Anti-Concentration} (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency. We show that the LAC condition is satisfied by a broad class of distributions, including Gaussian, exponential, uniform, Cauchy, and Student's~$t$ distributions, along with other exponential family distributions and their truncated variants. This significantly expands the class of distributions under which greedy algorithms can perform efficiently. Under our proposed LAC condition, we prove that the cumulative expected regret of the greedy algorithm for the linear contextual bandit is bounded by $O(\operatorname{poly} \log T)$. Our results establish the widest range of distributions known to date that allow a sublinear regret bound for greedy algorithms, further achieving a sharp poly-logarithmic regret.
Related papers
- Lasso Bandit with Compatibility Condition on Optimal Arm [10.216425987201333]
We consider a sparse linear bandit problem where only a sparse subset of context features affects the expected reward function.
We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(textpolylog dT)$ regret.
arXiv Detail & Related papers (2024-06-02T18:11:47Z) - Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Squeeze All: Novel Estimator and Self-Normalized Bound for Linear
Contextual Bandits [18.971564419292893]
We propose a linear contextual bandit algorithm with $O(sqrtdTlog T)$ regret bound, where $d$ is the dimension of contexts and $T$ is the time horizon.
Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization.
arXiv Detail & Related papers (2022-06-11T02:43:17Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Bandit algorithms: Letting go of logarithmic regret for statistical
robustness [0.0]
We study regret in a multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness.
We show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret.
arXiv Detail & Related papers (2020-06-22T07:18:47Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.