Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis
- URL: http://arxiv.org/abs/2205.05662v1
- Date: Wed, 11 May 2022 17:43:54 GMT
- Title: Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis
- Authors: Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang
- Abstract summary: We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
- Score: 94.64007376939735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advanced deep neural networks (DNNs), designed by either human or AutoML
algorithms, are growing increasingly complex. Diverse operations are connected
by complicated connectivity patterns, e.g., various types of skip connections.
Those topological compositions are empirically effective and observed to smooth
the loss landscape and facilitate the gradient flow in general. However, it
remains elusive to derive any principled understanding of their effects on the
DNN capacity or trainability, and to understand why or in which aspect one
specific connectivity pattern is better than another. In this work, we
theoretically characterize the impact of connectivity patterns on the
convergence of DNNs under gradient descent training in fine granularity. By
analyzing a wide network's Neural Network Gaussian Process (NNGP), we are able
to depict how the spectrum of an NNGP kernel propagates through a particular
connectivity pattern, and how that affects the bound of convergence rates. As
one practical implication of our results, we show that by a simple filtration
on "unpromising" connectivity patterns, we can trim down the number of models
to evaluate, and significantly accelerate the large-scale neural architecture
search without any overhead. Codes will be released at
https://github.com/chenwydj/architecture_convergence.
Related papers
- Topological Neural Networks go Persistent, Equivariant, and Continuous [6.314000948709255]
We introduce TopNets as a broad framework that subsumes and unifies various methods in the intersection of GNNs/TNNs and PH.
TopNets achieve strong performance across diverse tasks, including antibody design, molecular dynamics simulation, and drug property prediction.
arXiv Detail & Related papers (2024-06-05T11:56:54Z) - Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.
This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z) - From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport [32.39176908225668]
We introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for measuring the non-linearity of deep neural networks.
We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature.
arXiv Detail & Related papers (2023-10-17T17:50:22Z) - Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Simple and Efficient Heterogeneous Graph Neural Network [55.56564522532328]
Heterogeneous graph neural networks (HGNNs) have powerful capability to embed rich structural and semantic information of a heterogeneous graph into node representations.
Existing HGNNs inherit many mechanisms from graph neural networks (GNNs) over homogeneous graphs, especially the attention mechanism and the multi-layer structure.
This paper conducts an in-depth and detailed study of these mechanisms and proposes Simple and Efficient Heterogeneous Graph Neural Network (SeHGNN)
arXiv Detail & Related papers (2022-07-06T10:01:46Z) - BScNets: Block Simplicial Complex Neural Networks [79.81654213581977]
Simplicial neural networks (SNN) have recently emerged as the newest direction in graph learning.
We present Block Simplicial Complex Neural Networks (BScNets) model for link prediction.
BScNets outperforms state-of-the-art models by a significant margin while maintaining low costs.
arXiv Detail & Related papers (2021-12-13T17:35:54Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Optimization and Generalization Analysis of Transduction through
Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing.
Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem.
We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.