Double Doubly Robust Thompson Sampling for Generalized Linear Contextual
Bandits
- URL: http://arxiv.org/abs/2209.06983v1
- Date: Thu, 15 Sep 2022 00:20:38 GMT
- Title: Double Doubly Robust Thompson Sampling for Generalized Linear Contextual
Bandits
- Authors: Wonyoung Kim, Kyungbok Lee, Myunghee Cho Paik
- Abstract summary: We propose a novel contextual bandit algorithm for generalized linear rewards with an $tildeO(sqrtkappa-1 phi T)$ regret over $T$ rounds.
We also provide an $O(kappa-1 phi log (NT) log T)$ regret bound for $N$ arms under a probabilistic margin condition.
- Score: 8.508198765617198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel contextual bandit algorithm for generalized linear rewards
with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where
$\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is
a lower bound of the variance of rewards. In several practical cases where
$\phi=O(d)$, our result is the first regret bound for generalized linear model
(GLM) bandits with the order $\sqrt{d}$ without relying on the approach of Auer
[2002]. We achieve this bound using a novel estimator called double
doubly-robust (DDR) estimator, a subclass of doubly-robust (DR) estimator but
with a tighter error bound. The approach of Auer [2002] achieves independence
by discarding the observed rewards, whereas our algorithm achieves independence
considering all contexts using our DDR estimator. We also provide an
$O(\kappa^{-1} \phi \log (NT) \log T)$ regret bound for $N$ arms under a
probabilistic margin condition. Regret bounds under the margin condition are
given by Bastani and Bayati [2020] and Bastani et al. [2021] under the setting
that contexts are common to all arms but coefficients are arm-specific. When
contexts are different for all arms but coefficients are common, ours is the
first regret bound under the margin condition for linear models or GLMs. We
conduct empirical studies using synthetic data and real examples, demonstrating
the effectiveness of our algorithm.
Related papers
- Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption.
We develop a strategy that is effective in both adversarial environments, with theoretical guarantees.
We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs [7.125769932993104]
We consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.
At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward.
We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm.
arXiv Detail & Related papers (2022-06-24T18:48:35Z) - Doubly robust Thompson sampling for linear payoffs [12.375561840897742]
We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling.
The proposed algorithm is designed to allow a novel additive regret decomposition leading to an improved regret bound with the order of $tildeO(phi-2sqrtT)$.
arXiv Detail & Related papers (2021-02-01T23:31:10Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Budget-Constrained Bandits over General Cost and Reward Distributions [32.63624728528415]
We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return.
We show that if moments of order $(2+gamma)$ for some $gamma > 0$ exist for all cost-reward pairs, $O(log B)$ regret is achievable for a budget $B>0$.
arXiv Detail & Related papers (2020-02-29T23:50:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.