A theory of representation learning gives a deep generalisation of
kernel methods
- URL: http://arxiv.org/abs/2108.13097v6
- Date: Thu, 25 May 2023 07:46:00 GMT
- Title: A theory of representation learning gives a deep generalisation of
kernel methods
- Authors: Adam X. Yang, Maxime Robeyns, Edward Milsom, Ben Anson, Nandi Schoots,
Laurence Aitchison
- Abstract summary: We develop a new infinite width limit, the Bayesian representation learning limit.
We show that it exhibits representation learning mirroring that in finite-width models.
Next, we introduce the possibility of using this limit and objective as a flexible, deep generalisation of kernel methods.
- Score: 22.260038428890383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The successes of modern deep machine learning methods are founded on their
ability to transform inputs across multiple layers to build good high-level
representations. It is therefore critical to understand this process of
representation learning. However, standard theoretical approaches (formally
NNGPs) involving infinite width limits eliminate representation learning. We
therefore develop a new infinite width limit, the Bayesian representation
learning limit, that exhibits representation learning mirroring that in
finite-width models, yet at the same time, retains some of the simplicity of
standard infinite-width limits. In particular, we show that Deep Gaussian
processes (DGPs) in the Bayesian representation learning limit have exactly
multivariate Gaussian posteriors, and the posterior covariances can be obtained
by optimizing an interpretable objective combining a log-likelihood to improve
performance with a series of KL-divergences which keep the posteriors close to
the prior. We confirm these results experimentally in wide but finite DGPs.
Next, we introduce the possibility of using this limit and objective as a
flexible, deep generalisation of kernel methods, that we call deep kernel
machines (DKMs). Like most naive kernel methods, DKMs scale cubically in the
number of datapoints. We therefore use methods from the Gaussian process
inducing point literature to develop a sparse DKM that scales linearly in the
number of datapoints. Finally, we extend these approaches to NNs (which have
non-Gaussian posteriors) in the Appendices.
Related papers
- Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression [0.0]
We analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics.
We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates.
arXiv Detail & Related papers (2024-06-18T14:50:42Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum [18.10812063219831]
We introduce Modified Spectrum Kernels (MSKs) to approximate kernels with desired eigenvalues.
We propose a preconditioned gradient descent method, which alters the trajectory of gradient descent.
Our method is both computationally efficient and simple to implement.
arXiv Detail & Related papers (2023-07-26T22:39:47Z) - Higher-order topological kernels via quantum computation [68.8204255655161]
Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data.
We propose a quantum approach to defining Betti kernels, which is based on constructing Betti curves with increasing order.
arXiv Detail & Related papers (2023-07-14T14:48:52Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - A Generalized EigenGame with Extensions to Multiview Representation
Learning [0.28647133890966997]
Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods.
We develop an approach to solving GEPs in which all constraints are softly enforced by Lagrange multipliers.
We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case.
We demonstrate the effectiveness of our method for solving GEPs in the setting of canonical multiview datasets.
arXiv Detail & Related papers (2022-11-21T10:11:13Z) - Gaussian Processes and Statistical Decision-making in Non-Euclidean
Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes.
We introduce a wide class of efficient approximations built from this viewpoint.
We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z) - Conditional Deep Gaussian Processes: empirical Bayes hyperdata learning [6.599344783327054]
We propose a conditional Deep Gaussian Process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata.
We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space.
Preliminary extrapolation results demonstrate expressive power of the proposed model compared with GP kernel composition, DGP variational inference, and deep kernel learning.
arXiv Detail & Related papers (2021-10-01T17:50:48Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Intrinsic Gaussian Processes on Manifolds and Their Accelerations by
Symmetry [9.773237080061815]
Existing methods primarily focus on low dimensional constrained domains for heat kernel estimation.
Our research proposes an intrinsic approach for constructing GP on general equations.
Our methodology estimates the heat kernel by simulating Brownian motion sample paths using the exponential map.
arXiv Detail & Related papers (2020-06-25T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.