Related papers: Stochastic Bandits with ReLU Neural Networks

Stochastic Bandits with ReLU Neural Networks

URL: http://arxiv.org/abs/2405.07331v1
Date: Sun, 12 May 2024 16:54:57 GMT
Title: Stochastic Bandits with ReLU Neural Networks
Authors: Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani,
Abstract summary: We show that a $tildeO(sqrtT)$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks. We propose an OFU-ReLU algorithm that can achieve this upper bound.
Score: 40.41457480347015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.

Related papers

Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks [0.022344294014777957]
Activation (AM) strives to generate optimal input, revealing features that trigger high responses in trained deep neural networks. We show that AM fails to produce optimal input for simple functions containing ReLUs or Leaky ReLUs. We propose a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass.
arXiv Detail & Related papers (2024-10-22T12:38:39Z)
Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off. We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z)
Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications. We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem. The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z)
Neural Contextual Bandits without Regret [47.73483756447701]
We propose algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function. We show that our approach converges to the optimal policy at a $tildemathcalO(T-1/2d)$ rate, where $d$ is the dimension of the context.
arXiv Detail & Related papers (2021-07-07T11:11:34Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
Neural Contextual Bandits with Deep Representation and Shallow Exploration [105.8099566651448]
We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.
arXiv Detail & Related papers (2020-12-03T09:17:55Z)
Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z)
Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation. At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
Provable Training of a ReLU Gate with an Iterative Non-Gradient Algorithm [0.7614628596146599]
We show provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes. We show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels. Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude.
arXiv Detail & Related papers (2020-05-08T17:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.