Low-Rank Learning by Design: the Role of Network Architecture and
Activation Linearity in Gradient Rank Collapse
- URL: http://arxiv.org/abs/2402.06751v1
- Date: Fri, 9 Feb 2024 19:28:02 GMT
- Title: Low-Rank Learning by Design: the Role of Network Architecture and
Activation Linearity in Gradient Rank Collapse
- Authors: Bradley T. Baker, Barak A. Pearlmutter, Robyn Miller, Vince D.
Calhoun, Sergey M. Plis
- Abstract summary: We study how architectural choices and structure of the data effect gradient rank bounds in deep neural networks (DNNs)
Our theoretical analysis provides these bounds for training fully-connected, recurrent, and convolutional neural networks.
We also demonstrate, both theoretically and empirically, how design choices like activation function linearity, bottleneck layer introduction, convolutional stride, and sequence truncation influence these bounds.
- Score: 14.817633094318253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our understanding of learning dynamics of deep neural networks (DNNs) remains
incomplete. Recent research has begun to uncover the mathematical principles
underlying these networks, including the phenomenon of "Neural Collapse", where
linear classifiers within DNNs converge to specific geometrical structures
during late-stage training. However, the role of geometric constraints in
learning extends beyond this terminal phase. For instance, gradients in
fully-connected layers naturally develop a low-rank structure due to the
accumulation of rank-one outer products over a training batch. Despite the
attention given to methods that exploit this structure for memory saving or
regularization, the emergence of low-rank learning as an inherent aspect of
certain DNN architectures has been under-explored. In this paper, we conduct a
comprehensive study of gradient rank in DNNs, examining how architectural
choices and structure of the data effect gradient rank bounds. Our theoretical
analysis provides these bounds for training fully-connected, recurrent, and
convolutional neural networks. We also demonstrate, both theoretically and
empirically, how design choices like activation function linearity, bottleneck
layer introduction, convolutional stride, and sequence truncation influence
these bounds. Our findings not only contribute to the understanding of learning
dynamics in DNNs, but also provide practical guidance for deep learning
engineers to make informed design decisions.
Related papers
- Deep neural networks architectures from the perspective of manifold
learning [0.0]
This paper is a comprehensive comparison and description of neural network architectures in terms of ge-ometry and topology.
We focus on the internal representation of neural networks and on the dynamics of changes in the topology and geometry of a data manifold on different layers.
arXiv Detail & Related papers (2023-06-06T04:57:39Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Leveraging The Topological Consistencies of Learning in Deep Neural
Networks [0.0]
We define a new class of topological features that accurately characterize the progress of learning while being quick to compute during running time.
Our proposed topological features are readily equipped for backpropagation, meaning that they can be incorporated in end-to-end training.
arXiv Detail & Related papers (2021-11-30T18:34:48Z) - On the Application of Data-Driven Deep Neural Networks in Linear and
Nonlinear Structural Dynamics [28.979990729816638]
The use of deep neural network (DNN) models as surrogates for linear and nonlinear structural dynamical systems is explored.
The focus is on the development of efficient network architectures using fully-connected, sparsely-connected, and convolutional network layers.
It is shown that the proposed DNNs can be used as effective and accurate surrogates for predicting linear and nonlinear dynamical responses under harmonic loadings.
arXiv Detail & Related papers (2021-11-03T13:22:19Z) - Characterizing Learning Dynamics of Deep Neural Networks via Complex
Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems.
We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation.
Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group [4.56877715768796]
We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear.
We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space.
Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
arXiv Detail & Related papers (2020-12-07T20:08:31Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.