Related papers: Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

URL: http://arxiv.org/abs/2205.05662v1
Date: Wed, 11 May 2022 17:43:54 GMT
Title: Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis
Authors: Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang
Abstract summary: We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
Score: 94.64007376939735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex. Diverse operations are connected by complicated connectivity patterns, e.g., various types of skip connections. Those topological compositions are empirically effective and observed to smooth the loss landscape and facilitate the gradient flow in general. However, it remains elusive to derive any principled understanding of their effects on the DNN capacity or trainability, and to understand why or in which aspect one specific connectivity pattern is better than another. In this work, we theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network's Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates. As one practical implication of our results, we show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate, and significantly accelerate the large-scale neural architecture search without any overhead. Codes will be released at https://github.com/chenwydj/architecture_convergence.

Related papers

NN-Former: Rethinking Graph Structure in Neural Architecture Representation [67.3378579108611]
Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures.<n>We show that sibling nodes are pivotal while overlooked in previous research.<n>Our approach consistently achieves promising performance in both accuracy and latency prediction.
arXiv Detail & Related papers (2025-07-01T15:46:18Z)
Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model [0.0]
Graph neural networks (GNNs) are designed to process data associated with graphs. GNNs can encounter difficulties in gathering information from nodes far apart by iterated aggregation steps. We show how the architecture of the GCN has to scale with the depth to avoid oversmoothing.
arXiv Detail & Related papers (2025-03-03T09:55:10Z)
Unveiling Mode Connectivity in Graph Neural Networks [30.854487554682628]
This work presents the first investigation of mode connectivity in graph neural networks (GNNs) We uncover that GNNs exhibit distinct non-linear mode connectivity, diverging from patterns observed in fully-connected networks or CNNs. We establish a link between mode connectivity and generalization, proposing a generalization bound based on loss barriers and revealing its utility as a diagnostic tool.
arXiv Detail & Related papers (2025-02-18T07:46:10Z)
Topological Neural Networks go Persistent, Equivariant, and Continuous [6.314000948709255]
We introduce TopNets as a broad framework that subsumes and unifies various methods in the intersection of GNNs/TNNs and PH. TopNets achieve strong performance across diverse tasks, including antibody design, molecular dynamics simulation, and drug property prediction.
arXiv Detail & Related papers (2024-06-05T11:56:54Z)
Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z)
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport [32.39176908225668]
We introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for measuring the non-linearity of deep neural networks. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature.
arXiv Detail & Related papers (2023-10-17T17:50:22Z)
Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process. Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z)
Simple and Efficient Heterogeneous Graph Neural Network [55.56564522532328]
Heterogeneous graph neural networks (HGNNs) have powerful capability to embed rich structural and semantic information of a heterogeneous graph into node representations. Existing HGNNs inherit many mechanisms from graph neural networks (GNNs) over homogeneous graphs, especially the attention mechanism and the multi-layer structure. This paper conducts an in-depth and detailed study of these mechanisms and proposes Simple and Efficient Heterogeneous Graph Neural Network (SeHGNN)
arXiv Detail & Related papers (2022-07-06T10:01:46Z)
BScNets: Block Simplicial Complex Neural Networks [79.81654213581977]
Simplicial neural networks (SNN) have recently emerged as the newest direction in graph learning. We present Block Simplicial Complex Neural Networks (BScNets) model for link prediction. BScNets outperforms state-of-the-art models by a significant margin while maintaining low costs.
arXiv Detail & Related papers (2021-12-13T17:35:54Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.