Rank Diminishing in Deep Neural Networks
- URL: http://arxiv.org/abs/2206.06072v1
- Date: Mon, 13 Jun 2022 12:03:32 GMT
- Title: Rank Diminishing in Deep Neural Networks
- Authors: Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan,
Zheng-Jun Zha
- Abstract summary: Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
- Score: 71.03777954670323
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rank of neural networks measures information flowing across layers. It is
an instance of a key structural condition that applies across broad domains of
machine learning. In particular, the assumption of low-rank feature
representations leads to algorithmic developments in many architectures. For
neural networks, however, the intrinsic mechanism that yields low-rank
structures remains vague and unclear. To fill this gap, we perform a rigorous
study on the behavior of network rank, focusing particularly on the notion of
rank deficiency. We theoretically establish a universal monotonic decreasing
property of network rank from the basic rules of differential and algebraic
composition, and uncover rank deficiency of network blocks and deep function
coupling. By virtue of our numerical tools, we provide the first empirical
analysis of the per-layer behavior of network rank in practical settings, i.e.,
ResNets, deep MLPs, and Transformers on ImageNet. These empirical results are
in direct accord with our theory. Furthermore, we reveal a novel phenomenon of
independence deficit caused by the rank deficiency of deep networks, where
classification confidence of a given category can be linearly decided by the
confidence of a handful of other categories. The theoretical results of this
work, together with the empirical findings, may advance understanding of the
inherent principles of deep neural networks.
Related papers
- Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study [8.183509993010983]
We study the neural scaling laws for deep operator networks using the Chen and Chen style architecture.
We quantify the neural scaling laws by analyzing its approximation and generalization errors.
Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.
arXiv Detail & Related papers (2024-10-01T03:06:55Z) - Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.
In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.
Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize [5.642322814965062]
Learning representations that generalize under distribution shifts is critical for building robust machine learning models.
We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
arXiv Detail & Related papers (2024-06-05T15:04:27Z) - Interpretable part-whole hierarchies and conceptual-semantic
relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues.
We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z) - Analytic Insights into Structure and Rank of Neural Network Hessian Maps [32.90143789616052]
Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
arXiv Detail & Related papers (2021-06-30T17:29:58Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints.
Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.