An Empirical Study of Neural Kernel Bandits
- URL: http://arxiv.org/abs/2111.03543v1
- Date: Fri, 5 Nov 2021 15:06:05 GMT
- Title: An Empirical Study of Neural Kernel Bandits
- Authors: Michal Lisicki, Arash Afkanpour, Graham W. Taylor
- Abstract summary: Research on neural kernels (NK) has recently established a correspondence between deep networks and GPs that take into account all the parameters of a NN.
We show that NK bandits achieve state-of-the-art performance on highly non-linear structured data.
- Score: 17.92492092034792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural bandits have enabled practitioners to operate efficiently on problems
with non-linear reward functions. While in general contextual bandits commonly
utilize Gaussian process (GP) predictive distributions for decision making, the
most successful neural variants use only the last layer parameters in the
derivation. Research on neural kernels (NK) has recently established a
correspondence between deep networks and GPs that take into account all the
parameters of a NN and can be trained more efficiently than most Bayesian NNs.
We propose to directly apply NK-induced distributions to guide an upper
confidence bound or Thompson sampling-based policy. We show that NK bandits
achieve state-of-the-art performance on highly non-linear structured data.
Furthermore, we analyze practical considerations such as training frequency and
model partitioning. We believe our work will help better understand the impact
of utilizing NKs in applied settings.
Related papers
- Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Guided Deep Kernel Learning [42.53025115287688]
We present a novel approach for learning deep kernels by utilizing infinite-width neural networks.
Our approach harnesses the reliable uncertainty estimation of the NNGPs to adapt the DKL target confidence when it encounters novel data points.
arXiv Detail & Related papers (2023-02-19T13:37:34Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Reinforcement Learning via Gaussian Processes with Neural Network Dual
Kernels [0.0]
We show that neural network dual kernels can be efficiently applied to regression and reinforcement learning problems.
We demonstrate, using the well-understood mountain-car problem, that GPs empowered with dual kernels perform at least as well as those using the conventional radial basis function kernel.
arXiv Detail & Related papers (2020-04-10T18:36:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.