Neural Contextual Bandits with Deep Representation and Shallow
Exploration
- URL: http://arxiv.org/abs/2012.01780v1
- Date: Thu, 3 Dec 2020 09:17:55 GMT
- Title: Neural Contextual Bandits with Deep Representation and Shallow
Exploration
- Authors: Pan Xu and Zheng Wen and Handong Zhao and Quanquan Gu
- Abstract summary: We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network.
Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.
- Score: 105.8099566651448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a general class of contextual bandits, where each context-action
pair is associated with a raw feature vector, but the reward generating
function is unknown. We propose a novel learning algorithm that transforms the
raw feature vector using the last hidden layer of a deep ReLU neural network
(deep representation learning), and uses an upper confidence bound (UCB)
approach to explore in the last linear layer (shallow exploration). We prove
that under standard assumptions, our proposed algorithm achieves
$\tilde{O}(\sqrt{T})$ finite-time regret, where $T$ is the learning time
horizon. Compared with existing neural contextual bandit algorithms, our
approach is computationally much more efficient since it only needs to explore
in the last layer of the deep neural network.
Related papers
- Neural Exploitation and Exploration of Contextual Bandits [51.25537742455235]
We study utilizing neural networks for the exploitation and exploration of contextual multi-armed bandits.
EE-Net is a novel neural-based exploitation and exploration strategy.
We show that EE-Net outperforms related linear and neural contextual bandit baselines on real-world datasets.
arXiv Detail & Related papers (2023-05-05T18:34:49Z) - Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions.
We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z) - EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits [52.98326168071513]
"EE-Net" is a neural-based bandit approach with a novel exploration strategy.
We show that EE-Net achieves $mathcalO(sqrtTlog T)$ regret, which is tighter than existing state-of-the-art neural bandit algorithms.
arXiv Detail & Related papers (2021-10-07T04:12:36Z) - Neural Contextual Bandits without Regret [47.73483756447701]
We propose algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function.
We show that our approach converges to the optimal policy at a $tildemathcalO(T-1/2d)$ rate, where $d$ is the dimension of the context.
arXiv Detail & Related papers (2021-07-07T11:11:34Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.