Related papers: Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

URL: http://arxiv.org/abs/2010.12642v2
Date: Tue, 9 Mar 2021 11:09:05 GMT
Title: Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits
Authors: Marc Abeille, Louis Faury and Cl\'ement Calauz\`enes
Abstract summary: Logistic Bandits have attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. We introduce a novel algorithm for which we provide a refined analysis of the effect of non-linearity.
Score: 9.833844886421694
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Logistic Bandits have recently attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant $\kappa$, characterizing the magnitude of the reward's non-linearity. In this paper we introduce a novel algorithm for which we provide a refined analysis. This allows for a better characterization of the effect of non-linearity and yields improved problem-dependent guarantees. In most favorable cases this leads to a regret upper-bound scaling as $\tilde{\mathcal{O}}(d\sqrt{T/\kappa})$, which dramatically improves over the $\tilde{\mathcal{O}}(d\sqrt{T}+\kappa)$ state-of-the-art guarantees. We prove that this rate is minimax-optimal by deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound. Our analysis identifies two regimes (permanent and transitory) of the regret, which ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of Dong et al. (2019). In contrast to previous works, we find that in the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off. While it also impacts the length of the transitory phase in a problem-dependent fashion, we show that this impact is mild in most reasonable configurations.

Related papers

Bandit Optimal Transport [1.0878040851638]
This article considers the bandit problem of learning to solve generic Kantorovich and entropic OT problems from repeated interactions. We provide $tildemathcal O(sqrtT)$ regret algorithms for both problems by extending linear bandits on Hilbert spaces.
arXiv Detail & Related papers (2025-02-11T09:24:25Z)
Improved Bound for Robust Causal Bandits with Linear Models [16.60875994745622]
This paper investigates the robustness of causal bandits in the face of temporal model fluctuations. The proposed algorithm achieves nearly optimal $tildemathcalO(sqrtT)$ regret when $C$ is $o(sqrtT)$.
arXiv Detail & Related papers (2024-05-13T14:41:28Z)
Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost. We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z)
Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits. We propose a refined analysis framework, which produces a simpler weight-based algorithm. Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model. It is assumed that the graph's structure is known and has $N$ nodes. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z)
Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$. We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z)
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits [45.43968161616453]
We study the optimal batch-regret tradeoff for batch linear contextual bandits. We prove its regret guarantee, which features a two-phase expression as the time horizon $T$ grows. We also prove a new matrix inequality concentration with dependence on their dynamic upper bounds.
arXiv Detail & Related papers (2021-10-15T12:32:33Z)
Non-stationary Linear Bandits Revisited [26.082923174615495]
Non-stationary linear bandits are a variant of linear bandits with a time-varying underlying regression parameter. We prove an $widetildeO(T3/4(1+P_T)1/4)$ dynamic regret for these algorithms, slightly worse than the rate as was anticipated.
arXiv Detail & Related papers (2021-03-09T10:07:17Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Improved Optimistic Algorithms for Logistic Bandits [16.140301473601454]
We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a $tildemathcalO(sqrtT)$ regret with no dependency in $kappa$, but for a second order term.
arXiv Detail & Related papers (2020-02-18T12:52:32Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.