Redundancy in Deep Linear Neural Networks
- URL: http://arxiv.org/abs/2206.04490v1
- Date: Thu, 9 Jun 2022 13:21:00 GMT
- Title: Redundancy in Deep Linear Neural Networks
- Authors: Oriel BenShmuel
- Abstract summary: Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimization advantages over a single linear layer.
This paper suggests that, in practice, the training process of deep linear fully-connected networks using conventionals is convex in the same manner as a single linear fully-connected layer.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional wisdom states that deep linear neural networks benefit from
expressiveness and optimization advantages over a single linear layer. This
paper suggests that, in practice, the training process of deep linear
fully-connected networks using conventional optimizers is convex in the same
manner as a single linear fully-connected layer. This paper aims to explain
this claim and demonstrate it. Even though convolutional networks are not
aligned with this description, this work aims to attain a new conceptual
understanding of fully-connected linear networks that might shed light on the
possible constraints of convolutional settings and non-linear architectures.
Related papers
- Combining Explicit and Implicit Regularization for Efficient Learning in
Deep Networks [3.04585143845864]
In deep linear networks, gradient descent implicitly regularizes toward low-rank solutions on matrix completion/factorization tasks.
We propose an explicit penalty to mirror this implicit bias which only takes effect with certain adaptive gradient generalizations.
This combination can enable a single-layer network to achieve low-rank approximations with degenerate error comparable to deep linear networks.
arXiv Detail & Related papers (2023-06-01T04:47:17Z) - Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman
Message Passing [0.0]
We present a novel approach based on Koopman operator theory and message passing networks.
We find a linear representation for the dynamical system which is globally valid at any time step.
The linearisations found by our method produce predictions on a suite of network dynamics problems that are several orders of magnitude better than current state-of-the-art techniques.
arXiv Detail & Related papers (2023-05-15T23:00:25Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - ReduNet: A White-box Deep Network from the Principle of Maximizing Rate
Reduction [32.489371527159236]
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.
We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets.
We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks.
arXiv Detail & Related papers (2021-05-21T16:29:57Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Deep Networks from the Principle of Rate Reduction [32.87280757001462]
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification.
We show that the basic iterative ascent gradient scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer.
All components of this "white box" network have precise optimization, statistical, and geometric interpretation.
arXiv Detail & Related papers (2020-10-27T06:01:43Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - From deep to Shallow: Equivalent Forms of Deep Networks in Reproducing
Kernel Krein Space and Indefinite Support Vector Machines [63.011641517977644]
We take a deep network and convert it to an equivalent (indefinite) kernel machine.
We then investigate the implications of this transformation for capacity control and uniform convergence.
Finally, we analyse the sparsity properties of the flat representation, showing that the flat weights are (effectively) Lp-"norm" regularised with 0p1.
arXiv Detail & Related papers (2020-07-15T03:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.