Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks
- URL: http://arxiv.org/abs/2405.17501v1
- Date: Sun, 26 May 2024 02:32:28 GMT
- Title: Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks
- Authors: Leyang Zhang, Yaoyu Zhang, Tao Luo,
- Abstract summary: This paper presents a comprehensive analysis of critical point sets in two-layer neural networks.
Given a critical point, we use these operators to uncover the whole underlying critical set representing the same output function.
We prove existence of saddle branches for any critical set whose output function can be represented by a narrower network.
- Score: 4.418361486640713
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents a comprehensive analysis of critical point sets in two-layer neural networks. To study such complex entities, we introduce the critical embedding operator and critical reduction operator as our tools. Given a critical point, we use these operators to uncover the whole underlying critical set representing the same output function, which exhibits a hierarchical structure. Furthermore, we prove existence of saddle branches for any critical set whose output function can be represented by a narrower network. Our results provide a solid foundation to the further study of optimization and training behavior of neural networks.
Related papers
- Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks [0.0]
We introduce a framework to investigate capacity, sparsity, and robustness in low-capacity networks.<n>We show that trained networks are robust to extreme magnitude pruning (up to 95% sparsity)<n>This work provides a clear, empirical demonstration of the trade-offs governing simple neural networks.
arXiv Detail & Related papers (2025-07-22T06:43:03Z) - Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity [1.837729564584369]
We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers.<n>We use an existing methodology for hierarchical organization of tensors to examine network structure.
arXiv Detail & Related papers (2025-06-18T15:14:56Z) - Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks [13.983863226803336]
We argue that "Feature Averaging" is one of the principal factors contributing to non-robustness of deep neural networks.
We provide a detailed theoretical analysis of the training dynamics of gradient descent in a two-layer ReLU network for a binary classification task.
We prove that, with the provision of more granular supervised information, a two-layer multi-class neural network is capable of learning individual features.
arXiv Detail & Related papers (2024-10-14T09:28:32Z) - Relational Composition in Neural Networks: A Survey and Call to Action [54.47858085003077]
Many neural nets appear to represent data as linear combinations of "feature vectors"
We argue that this success is incomplete without an understanding of relational composition.
arXiv Detail & Related papers (2024-07-19T20:50:57Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes.
We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure.
Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z) - Credit Assignment for Trained Neural Networks Based on Koopman Operator
Theory [3.130109807128472]
Credit assignment problem of neural networks refers to evaluating the credit of each network component to the final outputs.
This paper presents an alternative perspective of linear dynamics on dealing with the credit assignment problem for trained neural networks.
Experiments conducted on typical neural networks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-02T06:34:27Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Functional Network: A Novel Framework for Interpretability of Deep
Neural Networks [2.641939670320645]
We propose a novel framework for interpretability of deep neural networks, that is, the functional network.
In our experiments, the mechanisms of regularization methods, namely, batch normalization and dropout, are revealed.
arXiv Detail & Related papers (2022-05-24T01:17:36Z) - Global Convergence Analysis of Deep Linear Networks with A One-neuron
Layer [18.06634056613645]
We consider optimizing deep linear networks which have a layer with one neuron under quadratic loss.
We describe the convergent point of trajectories with arbitrary starting point under flow.
We show specific convergence rates of trajectories that converge to the global gradientr by stages.
arXiv Detail & Related papers (2022-01-08T04:44:59Z) - Analytic Insights into Structure and Rank of Neural Network Hessian Maps [32.90143789616052]
Hessian of a neural network captures parameter interactions through second-order derivatives of the loss.
We develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency.
This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks.
arXiv Detail & Related papers (2021-06-30T17:29:58Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.