Kernelized Classification in Deep Networks
- URL: http://arxiv.org/abs/2012.09607v2
- Date: Thu, 18 Mar 2021 21:41:28 GMT
- Title: Kernelized Classification in Deep Networks
- Authors: Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar
- Abstract summary: We propose a kernelized classification layer for deep networks.
We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training.
We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
- Score: 49.47339560731506
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a kernelized classification layer for deep networks. Although
conventional deep networks introduce an abundance of nonlinearity for
representation (feature) learning, they almost universally use a linear
classifier on the learned feature vectors. We advocate a nonlinear
classification layer by using the kernel trick on the softmax cross-entropy
loss function during training and the scorer function during testing. However,
the choice of the kernel remains a challenge. To tackle this, we theoretically
show the possibility of optimizing over all possible positive definite kernels
applicable to our problem setting. This theory is then used to device a new
kernelized classification layer that learns the optimal kernel function for a
given problem automatically within the deep network itself. We show the
usefulness of the proposed nonlinear classification layer on several datasets
and tasks.
Related papers
- Local Kernel Renormalization as a mechanism for feature learning in
overparametrized Convolutional Neural Networks [0.0]
Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts.
State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime.
We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
arXiv Detail & Related papers (2023-07-21T17:22:04Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Utilizing Excess Resources in Training Neural Networks [41.07083436560303]
We implement a linear cascade of filtering layers in a kernel filtering fashion, which prevents the trained architecture from becoming unnecessarily deeper.
This also allows using our approach with almost any network architecture and let combining the filtering layers into a single layer in test time.
We demonstrate the advantage of KFLO on various network models and datasets in supervised learning.
arXiv Detail & Related papers (2022-07-12T13:48:40Z) - Wide and Deep Neural Networks Achieve Optimality for Classification [23.738242876364865]
We identify and construct an explicit set of neural network classifiers that achieve optimality.
In particular, we provide explicit activation functions that can be used to construct networks that achieve optimality.
Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
arXiv Detail & Related papers (2022-04-29T14:27:42Z) - Neural Networks as Kernel Learners: The Silent Alignment Effect [86.44610122423994]
Neural networks in the lazy training regime converge to kernel machines.
We show that this can indeed happen due to a phenomenon we term silent alignment.
We also demonstrate that non-whitened data can weaken the silent alignment effect.
arXiv Detail & Related papers (2021-10-29T18:22:46Z) - NFT-K: Non-Fungible Tangent Kernels [23.93508901712177]
We develop a new network as a combination of multiple neural tangent kernels, one to model each layer of the deep neural network individually.
We demonstrate the interpretability of this model on two datasets, showing that the multiple kernels model elucidates the interplay between the layers and predictions.
arXiv Detail & Related papers (2021-10-11T00:35:47Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science.
In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.