Related papers: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

URL: http://arxiv.org/abs/2012.00152v1
Date: Mon, 30 Nov 2020 23:02:47 GMT
Title: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
Authors: Pedro Domingos
Abstract summary: Deep learning's successes are often attributed to its ability to automatically discover new representations of the data. We show, however, that deep networks learned by the standard gradient descent algorithm are mathematically approximately equivalent to kernel machines.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.

Related papers

Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines [23.09717258810923]
Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10. We introduce several modifications to improve the convolutional deep kernel machine's generalisation. The resulting model achieves 94.5% test accuracy on CIFAR-10.
arXiv Detail & Related papers (2024-10-08T16:15:53Z)
Neural Network Pruning by Gradient Descent [7.427858344638741]
We introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique. We demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters. We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
arXiv Detail & Related papers (2023-11-21T11:12:03Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Neural Networks as Kernel Learners: The Silent Alignment Effect [86.44610122423994]
Neural networks in the lazy training regime converge to kernel machines. We show that this can indeed happen due to a phenomenon we term silent alignment. We also demonstrate that non-whitened data can weaken the silent alignment effect.
arXiv Detail & Related papers (2021-10-29T18:22:46Z)
Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience. We show that sparse coding can effectively maximize the entropy of the output signals. Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z)
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z)
Malicious Network Traffic Detection via Deep Learning: An Information Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset. Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.