Every Model Learned by Gradient Descent Is Approximately a Kernel
Machine
- URL: http://arxiv.org/abs/2012.00152v1
- Date: Mon, 30 Nov 2020 23:02:47 GMT
- Title: Every Model Learned by Gradient Descent Is Approximately a Kernel
Machine
- Authors: Pedro Domingos
- Abstract summary: Deep learning's successes are often attributed to its ability to automatically discover new representations of the data.
We show, however, that deep networks learned by the standard gradient descent algorithm are mathematically approximately equivalent to kernel machines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning's successes are often attributed to its ability to
automatically discover new representations of the data, rather than relying on
handcrafted features like other learning methods. We show, however, that deep
networks learned by the standard gradient descent algorithm are in fact
mathematically approximately equivalent to kernel machines, a learning method
that simply memorizes the data and uses it directly for prediction via a
similarity function (the kernel). This greatly enhances the interpretability of
deep network weights, by elucidating that they are effectively a superposition
of the training examples. The network architecture incorporates knowledge of
the target function into the kernel. This improved understanding should lead to
better learning algorithms.
Related papers
- Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines [23.09717258810923]
Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10.
We introduce several modifications to improve the convolutional deep kernel machine's generalisation.
The resulting model achieves 94.5% test accuracy on CIFAR-10.
arXiv Detail & Related papers (2024-10-08T16:15:53Z) - Neural Network Pruning by Gradient Descent [7.427858344638741]
We introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique.
We demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters.
We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
arXiv Detail & Related papers (2023-11-21T11:12:03Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Neural Networks as Kernel Learners: The Silent Alignment Effect [86.44610122423994]
Neural networks in the lazy training regime converge to kernel machines.
We show that this can indeed happen due to a phenomenon we term silent alignment.
We also demonstrate that non-whitened data can weaken the silent alignment effect.
arXiv Detail & Related papers (2021-10-29T18:22:46Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Malicious Network Traffic Detection via Deep Learning: An Information
Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset.
Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.