Related papers: Improved Optimistic Algorithms for Logistic Bandits

Improved Optimistic Algorithms for Logistic Bandits

URL: http://arxiv.org/abs/2002.07530v2
Date: Mon, 8 Jun 2020 07:36:22 GMT
Title: Improved Optimistic Algorithms for Logistic Bandits
Authors: Louis Faury, Marc Abeille, Cl\'ement Calauz\`enes, Olivier Fercoq
Abstract summary: We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a $tildemathcalO(sqrtT)$ regret with no dependency in $kappa$, but for a second order term.
Score: 16.140301473601454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generalized linear bandit framework has attracted a lot of attention in recent years by extending the well-understood linear setting and allowing to model richer reward structures. It notably covers the logistic model, widely used when rewards are binary. For logistic bandits, the frequentist regret guarantees of existing algorithms are $\tilde{\mathcal{O}}(\kappa \sqrt{T})$, where $\kappa$ is a problem-dependent constant. Unfortunately, $\kappa$ can be arbitrarily large as it scales exponentially with the size of the decision set. This may lead to significantly loose regret bounds and poor empirical performance. In this work, we study the logistic bandit with a focus on the prohibitive dependencies introduced by $\kappa$. We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a $\tilde{\mathcal{O}}(\sqrt{T})$ regret with no dependency in $\kappa$, but for a second order term. Our analysis is based on a new tail-inequality for self-normalized martingales, of independent interest.

Related papers

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion [40.159846982245746]
We construct a convex confidence set based on only the textitexistence of an online learning algorithm with a regret guarantee. Using R2CS, we obtain a strict improvement in the regret bound w.r.t. $S$ in logistic bandits while retaining computational feasibility. We extend this analysis to multinomial logistic bandits and obtain similar improvements in the regret, showing the efficacy of R2CS.
arXiv Detail & Related papers (2023-10-28T01:27:52Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability. We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z)
Leveraging Initial Hints for Free in Stochastic Linear Bandits [38.58449930961883]
We study the setting of optimizing with bandit feedback with additional prior knowledge provided to a learner in the form of an initial hint of the optimal action. We present a novel algorithm for linear bandits that uses this hint to improve its regret to $tilde O(sqrtT)$ when the hint is accurate. Perhaps surprisingly, our work shows that leveraging a hint shows provable gains without sacrificing worst-case performance.
arXiv Detail & Related papers (2022-03-08T18:48:55Z)
An Experimental Design Approach for Regret Minimization in Logistic Bandits [26.674062544226636]
The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant $kappa$. We propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general.
arXiv Detail & Related papers (2022-02-04T21:56:40Z)
Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits [9.833844886421694]
Logistic Bandits have attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. We introduce a novel algorithm for which we provide a refined analysis of the effect of non-linearity.
arXiv Detail & Related papers (2020-10-23T20:07:31Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support. We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon [88.75843804630772]
Episodic reinforcement learning generalizes contextual bandits. Long planning horizon and unknown state-dependent transitions pose little additional difficulty on sample complexity. We show MVP enjoys an $left(left(sqrtSAK + S2Aright)$ regret, approaching the $Omegaleft(sqrtSAK + S2Aright)$ lower bound of emphcontextual bandits up to logarithmic terms.
arXiv Detail & Related papers (2020-09-28T17:52:32Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.