Randomized Exploration in Generalized Linear Bandits
- URL: http://arxiv.org/abs/1906.08947v3
- Date: Mon, 10 Jul 2023 22:50:52 GMT
- Title: Randomized Exploration in Generalized Linear Bandits
- Authors: Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad
Ghavamzadeh, and Craig Boutilier
- Abstract summary: We study two randomized algorithms for generalized linear bandits.
The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.
The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards.
- Score: 56.05007606177762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study two randomized algorithms for generalized linear bandits. The first,
GLM-TSL, samples a generalized linear model (GLM) from the Laplace
approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to
a randomly perturbed history of past rewards. We analyze both algorithms and
derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret,
where $d$ is the number of features and $K$ is the number of arms. The former
improves on prior work while the latter is the first for Gaussian noise
perturbations in non-linear models. We empirically evaluate both GLM-TSL and
GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our
work showcases the role of randomization, beyond posterior sampling, in
exploration.
Related papers
- Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Combinatorial Causal Bandits [25.012065471684025]
In causal bandits, the learning agent chooses at most $K$ variables in each round to intervene, with the goal of minimizing expected regret on the target variable $Y$.
We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models.
We present the algorithm BGLM-OFU for Markovian BGLMs based on the maximum likelihood estimation method, and show that it achieves $O(sqrtTlog T)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2022-06-04T14:14:58Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z) - Perturbed-History Exploration in Stochastic Linear Bandits [35.70267786499955]
We propose a new online algorithm for cumulative regret in a linear bandit.
The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history.
arXiv Detail & Related papers (2019-03-21T17:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.