Convergence Analysis and Implicit Regularization of Feedback Alignment
for Deep Linear Networks
- URL: http://arxiv.org/abs/2110.10815v1
- Date: Wed, 20 Oct 2021 22:57:03 GMT
- Title: Convergence Analysis and Implicit Regularization of Feedback Alignment
for Deep Linear Networks
- Authors: Manuela Girotti and Ioannis Mitliagkas and Gauthier Gidel
- Abstract summary: We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks.
We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics.
- Score: 27.614609336582568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient
alternative to backpropagation for training neural networks. We provide
convergence guarantees with rates for deep linear networks for both continuous
and discrete dynamics. Additionally, we study incremental learning phenomena
for shallow linear networks. Interestingly, certain specific initializations
imply that negligible components are learned before the principal ones, thus
potentially negatively affecting the effectiveness of such a learning
algorithm; a phenomenon we classify as implicit anti-regularization. We also
provide initialization schemes where the components of the problem are
approximately learned by decreasing order of importance, thus providing a form
of implicit regularization.
Related papers
- Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - Implicit Regularization via Spectral Neural Networks and Non-linear
Matrix Sensing [2.171120568435925]
Spectral Neural Networks (abbrv. SNN) is particularly suitable for matrix learning problems.
We show that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets.
We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
arXiv Detail & Related papers (2024-02-27T15:28:01Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - On the Explicit Role of Initialization on the Convergence and Implicit
Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow.
We show that the squared loss converges exponentially to its optimum.
We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z) - DL-Reg: A Deep Learning Regularization Technique using Linear Regression [4.1359299555083595]
This paper proposes a novel deep learning regularization method named as DL-Reg.
It carefully reduces the nonlinearity of deep networks to a certain extent by explicitly enforcing the network to behave as much linear as possible.
The performance of DL-Reg is evaluated by training state-of-the-art deep network models on several benchmark datasets.
arXiv Detail & Related papers (2020-10-31T21:53:24Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - On Connections between Regularizations for Improving DNN Robustness [67.28077776415724]
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs)
We study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional.
arXiv Detail & Related papers (2020-07-04T23:43:32Z) - Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear
Networks [39.856439772974454]
We show that the width needed for efficient convergence to a global minimum is independent of the depth.
Our results suggest an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.
arXiv Detail & Related papers (2020-01-16T18:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.