Neural Contextual Bandits without Regret
- URL: http://arxiv.org/abs/2107.03144v1
- Date: Wed, 7 Jul 2021 11:11:34 GMT
- Title: Neural Contextual Bandits without Regret
- Authors: Parnian Kassraie, Andreas Krause
- Abstract summary: We propose algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function.
We show that our approach converges to the optimal policy at a $tildemathcalO(T-1/2d)$ rate, where $d$ is the dimension of the context.
- Score: 47.73483756447701
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Contextual bandits are a rich model for sequential decision making given side
information, with important applications, e.g., in recommender systems. We
propose novel algorithms for contextual bandits harnessing neural networks to
approximate the unknown reward function. We resolve the open problem of proving
sublinear regret bounds in this setting for general context sequences,
considering both fully-connected and convolutional networks. To this end, we
first analyze NTK-UCB, a kernelized bandit optimization algorithm employing the
Neural Tangent Kernel (NTK), and bound its regret in terms of the NTK maximum
information gain $\gamma_T$, a complexity parameter capturing the difficulty of
learning. Our bounds on $\gamma_T$ for the NTK may be of independent interest.
We then introduce our neural network based algorithm NN-UCB, and show that its
regret closely tracks that of NTK-UCB. Under broad non-parametric assumptions
about the reward function, our approach converges to the optimal policy at a
$\tilde{\mathcal{O}}(T^{-1/2d})$ rate, where $d$ is the dimension of the
context.
Related papers
- Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions.
We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z) - Neural Optimization Kernel: Towards Robust Deep Learning [13.147925376013129]
Recent studies show a connection between neural networks (NN) and kernel methods.
This paper proposes a novel kernel family named Kernel (NOK)
We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
arXiv Detail & Related papers (2021-06-11T00:34:55Z) - Neural Contextual Bandits with Deep Representation and Shallow
Exploration [105.8099566651448]
We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network.
Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.
arXiv Detail & Related papers (2020-12-03T09:17:55Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.