Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits
- URL: http://arxiv.org/abs/2010.12642v2
- Date: Tue, 9 Mar 2021 11:09:05 GMT
- Title: Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits
- Authors: Marc Abeille, Louis Faury and Cl\'ement Calauz\`enes
- Abstract summary: Logistic Bandits have attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits.
We introduce a novel algorithm for which we provide a refined analysis of the effect of non-linearity.
- Score: 9.833844886421694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logistic Bandits have recently attracted substantial attention, by providing
an uncluttered yet challenging framework for understanding the impact of
non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that
the learning-theoretic difficulties of Logistic Bandits can be embodied by a
large (sometimes prohibitively) problem-dependent constant $\kappa$,
characterizing the magnitude of the reward's non-linearity. In this paper we
introduce a novel algorithm for which we provide a refined analysis. This
allows for a better characterization of the effect of non-linearity and yields
improved problem-dependent guarantees. In most favorable cases this leads to a
regret upper-bound scaling as $\tilde{\mathcal{O}}(d\sqrt{T/\kappa})$, which
dramatically improves over the $\tilde{\mathcal{O}}(d\sqrt{T}+\kappa)$
state-of-the-art guarantees. We prove that this rate is minimax-optimal by
deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound. Our
analysis identifies two regimes (permanent and transitory) of the regret, which
ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of
Dong et al. (2019). In contrast to previous works, we find that in the
permanent regime non-linearity can dramatically ease the
exploration-exploitation trade-off. While it also impacts the length of the
transitory phase in a problem-dependent fashion, we show that this impact is
mild in most reasonable configurations.
Related papers
- Improved Bound for Robust Causal Bandits with Linear Models [16.60875994745622]
This paper investigates the robustness of causal bandits in the face of temporal model fluctuations.
The proposed algorithm achieves nearly optimal $tildemathcalO(sqrtT)$ regret when $C$ is $o(sqrtT)$.
arXiv Detail & Related papers (2024-05-13T14:41:28Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$.
We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z) - Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits [45.43968161616453]
We study the optimal batch-regret tradeoff for batch linear contextual bandits.
We prove its regret guarantee, which features a two-phase expression as the time horizon $T$ grows.
We also prove a new matrix inequality concentration with dependence on their dynamic upper bounds.
arXiv Detail & Related papers (2021-10-15T12:32:33Z) - Non-stationary Linear Bandits Revisited [26.082923174615495]
Non-stationary linear bandits are a variant of linear bandits with a time-varying underlying regression parameter.
We prove an $widetildeO(T3/4(1+P_T)1/4)$ dynamic regret for these algorithms, slightly worse than the rate as was anticipated.
arXiv Detail & Related papers (2021-03-09T10:07:17Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Improved Optimistic Algorithms for Logistic Bandits [16.140301473601454]
We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function.
We show that it enjoys a $tildemathcalO(sqrtT)$ regret with no dependency in $kappa$, but for a second order term.
arXiv Detail & Related papers (2020-02-18T12:52:32Z) - Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries.
Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics.
This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.