EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits
- URL: http://arxiv.org/abs/2110.03177v1
- Date: Thu, 7 Oct 2021 04:12:36 GMT
- Title: EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits
- Authors: Yikun Ban, Yuchen Yan, Arindam Banerjee, Jingrui He
- Abstract summary: "EE-Net" is a neural-based bandit approach with a novel exploration strategy.
We show that EE-Net achieves $mathcalO(sqrtTlog T)$ regret, which is tighter than existing state-of-the-art neural bandit algorithms.
- Score: 52.98326168071513
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Contextual multi-armed bandits have been studied for decades and adapted to
various applications such as online advertising and personalized
recommendation. To solve the exploitation-exploration tradeoff in bandits,
there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and
Upper Confidence Bound (UCB). In recent literature, linear contextual bandits
have adopted ridge regression to estimate the reward function and combine it
with TS or UCB strategies for exploration. However, this line of works
explicitly assumes the reward is based on a linear function of arm vectors,
which may not be true in real-world datasets. To overcome this challenge, a
series of neural-based bandit algorithms have been proposed, where a neural
network is assigned to learn the underlying reward function and TS or UCB are
adapted for exploration. In this paper, we propose "EE-Net", a neural-based
bandit approach with a novel exploration strategy. In addition to utilizing a
neural network (Exploitation network) to learn the reward function, EE-Net
adopts another neural network (Exploration network) to adaptively learn
potential gains compared to currently estimated reward. Then, a decision-maker
is constructed to combine the outputs from the Exploitation and Exploration
networks. We prove that EE-Net achieves $\mathcal{O}(\sqrt{T\log T})$ regret,
which is tighter than existing state-of-the-art neural bandit algorithms
($\mathcal{O}(\sqrt{T}\log T)$ for both UCB-based and TS-based). Through
extensive experiments on four real-world datasets, we show that EE-Net
outperforms existing linear and neural bandit approaches.
Related papers
- Neural Exploitation and Exploration of Contextual Bandits [51.25537742455235]
We study utilizing neural networks for the exploitation and exploration of contextual multi-armed bandits.
EE-Net is a novel neural-based exploitation and exploration strategy.
We show that EE-Net outperforms related linear and neural contextual bandit baselines on real-world datasets.
arXiv Detail & Related papers (2023-05-05T18:34:49Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Federated Neural Bandit [46.64825970508973]
In this paper, we introduce the federated neural-upper confidence bound (FN-UCB) algorithm for neural contextual bandit.
To better exploit the federated setting, we adopt a weighted combination of two UCBs: $textUCBa$ allows every agent to additionally use the observations from the other agents to accelerate exploration.
We prove sub-linear upper bounds on both the cumulative regret and the number of communication rounds of FN-UCB, and use empirical experiments to demonstrate its competitive performances.
arXiv Detail & Related papers (2022-05-28T02:58:37Z) - Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions.
We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z) - Convolutional Neural Bandit: Provable Algorithm for Visual-aware
Advertising [41.30283330958433]
Contextual multi-armed bandit has shown success in the application of advertising to solve the exploration-exploitation dilemma existed in the recommendation procedure.
Inspired by the visual-aware advertising, in this paper, we propose a contextual bandit algorithm.
arXiv Detail & Related papers (2021-07-02T03:02:29Z) - Neural Contextual Bandits with Deep Representation and Shallow
Exploration [105.8099566651448]
We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network.
Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.
arXiv Detail & Related papers (2020-12-03T09:17:55Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.