From deep to Shallow: Equivalent Forms of Deep Networks in Reproducing
Kernel Krein Space and Indefinite Support Vector Machines
- URL: http://arxiv.org/abs/2007.07459v2
- Date: Tue, 8 Sep 2020 07:27:16 GMT
- Title: From deep to Shallow: Equivalent Forms of Deep Networks in Reproducing
Kernel Krein Space and Indefinite Support Vector Machines
- Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh
- Abstract summary: We take a deep network and convert it to an equivalent (indefinite) kernel machine.
We then investigate the implications of this transformation for capacity control and uniform convergence.
Finally, we analyse the sparsity properties of the flat representation, showing that the flat weights are (effectively) Lp-"norm" regularised with 0p1.
- Score: 63.011641517977644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we explore a connection between deep networks and learning in
reproducing kernel Krein space. Our approach is based on the concept of
push-forward - that is, taking a fixed non-linear transform on a linear
projection and converting it to a linear projection on the output of a fixed
non-linear transform, pushing the weights forward through the non-linearity.
Applying this repeatedly from the input to the output of a deep network, the
weights can be progressively "pushed" to the output layer, resulting in a flat
network that has the form of a fixed non-linear map (whose form is determined
by the structure of the deep network) followed by a linear projection
determined by the weight matrices - that is, we take a deep network and convert
it to an equivalent (indefinite) kernel machine. We then investigate the
implications of this transformation for capacity control and uniform
convergence, and provide a Rademacher complexity bound on the deep network in
terms of Rademacher complexity in reproducing kernel Krein space. Finally, we
analyse the sparsity properties of the flat representation, showing that the
flat weights are (effectively) Lp-"norm" regularised with 0<p<1 (bridge
regression).
Related papers
- Feature Learning and Generalization in Deep Networks with Orthogonal Weights [1.7956122940209063]
Deep neural networks with numerically weights from independent Gaussian distributions can be tuned to criticality.
These networks still exhibit fluctuations that grow linearly with the depth of the network.
We show analytically that rectangular networks with tanh activations and weights from the ensemble of matrices have corresponding preactivation fluctuations.
arXiv Detail & Related papers (2023-10-11T18:00:02Z) - From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity [54.01594785269913]
We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss.
The training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset.
arXiv Detail & Related papers (2023-09-28T15:19:30Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Redundancy in Deep Linear Neural Networks [0.0]
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimization advantages over a single linear layer.
This paper suggests that, in practice, the training process of deep linear fully-connected networks using conventionals is convex in the same manner as a single linear fully-connected layer.
arXiv Detail & Related papers (2022-06-09T13:21:00Z) - Training invariances and the low-rank phenomenon: beyond linear networks [44.02161831977037]
We show that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-$1$ matrices.
This is the first time a low-rank phenomenon is proven rigorously for nonlinear ReLU-activated feedforward networks.
Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.
arXiv Detail & Related papers (2022-01-28T07:31:19Z) - Deep orthogonal linear networks are shallow [9.434391240650266]
We show that training the weights with gradient gradient descent is equivalent to training the whole factorization by gradient descent.
This means that there is no effect of overparametrization and implicit bias at all in this setting.
arXiv Detail & Related papers (2020-11-27T16:57:19Z) - Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling.
We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category.
We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.