Contextual Bandits with Online Neural Regression
- URL: http://arxiv.org/abs/2312.07145v1
- Date: Tue, 12 Dec 2023 10:28:51 GMT
- Title: Contextual Bandits with Online Neural Regression
- Authors: Rohan Deb, Yikun Ban, Shiliang Zuo, Jingrui He, Arindam Banerjee
- Abstract summary: We investigate the use of neural networks for online regression and associated Neural Contextual Bandits (NeuCBs)
Using existing results for wide networks, one can readily show a $mathcalO(sqrtT)$ regret for online regression with square loss.
We show a $mathcalO(log T)$ regret for online regression with both squared loss and KL loss, and subsequently convert these respectively to $tildemathcalO(sqrtKT)$ and $tildemathcalO
- Score: 46.82558739203106
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent works have shown a reduction from contextual bandits to online
regression under a realizability assumption [Foster and Rakhlin, 2020, Foster
and Krishnamurthy, 2021]. In this work, we investigate the use of neural
networks for such online regression and associated Neural Contextual Bandits
(NeuCBs). Using existing results for wide networks, one can readily show a
${\mathcal{O}}(\sqrt{T})$ regret for online regression with square loss, which
via the reduction implies a ${\mathcal{O}}(\sqrt{K} T^{3/4})$ regret for
NeuCBs. Departing from this standard approach, we first show a
$\mathcal{O}(\log T)$ regret for online regression with almost convex losses
that satisfy QG (Quadratic Growth) condition, a generalization of the PL
(Polyak-\L ojasiewicz) condition, and that have a unique minima. Although not
directly applicable to wide networks since they do not have unique minima, we
show that adding a suitable small random perturbation to the network
predictions surprisingly makes the loss satisfy QG with unique minima. Based on
such a perturbed prediction, we show a ${\mathcal{O}}(\log T)$ regret for
online regression with both squared loss and KL loss, and subsequently convert
these respectively to $\tilde{\mathcal{O}}(\sqrt{KT})$ and
$\tilde{\mathcal{O}}(\sqrt{KL^*} + K)$ regret for NeuCB, where $L^*$ is the
loss of the best policy. Separately, we also show that existing regret bounds
for NeuCBs are $\Omega(T)$ or assume i.i.d. contexts, unlike this work.
Finally, our experimental results on various datasets demonstrate that our
algorithms, especially the one based on KL loss, persistently outperform
existing algorithms.
Related papers
- Contextual Bandit Optimization with Pre-Trained Neural Networks [0.0]
We investigate how pre-training can help us in the regime of smaller models.
We show sublinear regret of E2TC when the dimension of the last layer and number of actions $K$ are much smaller than the horizon $T$.
In the weak training regime, when only the last layer is learned, the problem reduces to a misspecified linear bandit.
arXiv Detail & Related papers (2025-01-09T10:21:19Z) - Conservative Contextual Bandits: Beyond Linear Representations [29.5947486908322]
Conservative Con Bandits (CCBs) address safety in sequential decision making by requiring that an agent's policy, along with minimizing regret, also satisfies a safety constraint.
We develop two algorithms $mathttC-SquareCB$ and $mathttC-FastCB$, using Inverse Gap Weighting (IGW) based exploration and an online regression oracle.
We show that the safety constraint is satisfied with high probability and that the regret of $mathttC-SquareCB$ is sub-linear in horizon $T$, while the
arXiv Detail & Related papers (2024-12-09T02:57:27Z) - Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes [29.466981306355066]
We show that gradient descent with a fixed learning rate $eta$ can only find local minima that represent smooth functions.
We also prove a nearly-optimal MSE bound of $widetildeO(n-4/5)$ within the strict interior of the support of the $n$ data points.
arXiv Detail & Related papers (2024-06-10T22:57:27Z) - A Unified Scheme of ResNet and Softmax [8.556540804058203]
We provide a theoretical analysis of the regression problem: $| langle exp(Ax) + A x, bf 1_n rangle-1 ( exp(Ax) + Ax )
This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before.
arXiv Detail & Related papers (2023-09-23T21:41:01Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - Generalization and Stability of Interpolating Neural Networks with
Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime.
We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$.
With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z) - When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work [59.29606307518154]
We show that as long as the width $m geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
We also consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
arXiv Detail & Related papers (2022-10-21T14:41:26Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.