Guided Deep Kernel Learning
- URL: http://arxiv.org/abs/2302.09574v2
- Date: Sun, 14 May 2023 14:24:07 GMT
- Title: Guided Deep Kernel Learning
- Authors: Idan Achituve, Gal Chechik, Ethan Fetaya
- Abstract summary: We present a novel approach for learning deep kernels by utilizing infinite-width neural networks.
Our approach harnesses the reliable uncertainty estimation of the NNGPs to adapt the DKL target confidence when it encounters novel data points.
- Score: 42.53025115287688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Combining Gaussian processes with the expressive power of deep neural
networks is commonly done nowadays through deep kernel learning (DKL).
Unfortunately, due to the kernel optimization process, this often results in
losing their Bayesian benefits. In this study, we present a novel approach for
learning deep kernels by utilizing infinite-width neural networks. We propose
to use the Neural Network Gaussian Process (NNGP) model as a guide to the DKL
model in the optimization process. Our approach harnesses the reliable
uncertainty estimation of the NNGPs to adapt the DKL target confidence when it
encounters novel data points. As a result, we get the best of both worlds, we
leverage the Bayesian behavior of the NNGP, namely its robustness to
overfitting, and accurate uncertainty estimation, while maintaining the
generalization abilities, scalability, and flexibility of deep kernels.
Empirically, we show on multiple benchmark datasets of varying sizes and
dimensionality, that our method is robust to overfitting, has good predictive
performance, and provides reliable uncertainty estimations.
Related papers
- Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Vecchia Gaussian Process Ensembles on Internal Representations of Deep
Neural Networks [0.0]
For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning.
We propose to combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN.
arXiv Detail & Related papers (2023-05-26T16:19:26Z) - Efficient Bayes Inference in Neural Networks through Adaptive Importance
Sampling [19.518237361775533]
In BNNs, a complete posterior distribution of the unknown weight and bias parameters of the network is produced during the training stage.
This feature is useful in countless machine learning applications.
It is particularly appealing in areas where decision-making has a crucial impact, such as medical healthcare or autonomous driving.
arXiv Detail & Related papers (2022-10-03T14:59:23Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Deep Neural Networks as Point Estimates for Deep Gaussian Processes [44.585609003513625]
We propose a sparse variational approximation for DGPs for which the approximate posterior mean has the same mathematical structure as a Deep Neural Network (DNN)
We make the forward pass through a DGP equivalent to a ReLU DNN by finding an interdomain transformation that represents the GP posterior mean as a sum of ReLU basis functions.
Experiments demonstrate improved accuracy and faster training compared to current DGP methods, while retaining favourable predictive uncertainties.
arXiv Detail & Related papers (2021-05-10T16:55:17Z) - The Promises and Pitfalls of Deep Kernel Learning [13.487684503022063]
We identify pathological behavior, including overfitting, on a simple toy example.
We explore this pathology, explaining its origins and considering how it applies to real datasets.
We find that a fully Bayesian treatment of deep kernel learning can rectify this overfitting and obtain the desired performance improvements.
arXiv Detail & Related papers (2021-02-24T07:56:49Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.