Related papers: Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

URL: http://arxiv.org/abs/2205.11786v2
Date: Wed, 7 Jun 2023 22:20:05 GMT
Title: Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Authors: Libin Zhu, Chaoyue Liu, Mikhail Belkin
Abstract summary: We show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.
Score: 20.44438519046223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.

Related papers

Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z)
A Derivation of Feedforward Neural Network Gradients Using Fr\'echet Calculus [0.0]
We show a derivation of the gradients of feedforward neural networks using Fr'teche calculus. We show how our analysis generalizes to more general neural network architectures including, but not limited to, convolutional networks.
arXiv Detail & Related papers (2022-09-27T08:14:00Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models [20.44438519046223]
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK) We show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.
arXiv Detail & Related papers (2022-03-10T01:27:01Z)
Hyperbolic Graph Neural Networks: A Review of Methods and Applications [55.5502008501764]
Graph neural networks generalize conventional neural networks to graph-structured data. The performance of Euclidean models in graph-related learning is still bounded and limited by the representation ability of Euclidean geometry. Recently, hyperbolic space has gained increasing popularity in processing graph data with tree-like structure and power-law distribution.
arXiv Detail & Related papers (2022-02-28T15:08:48Z)
Sheaf Neural Networks [2.4366811507669124]
We present a generalization of graph convolutional networks by generalizing the diffusion operation underlying this class of networks. We show that the resulting sheaf neural networks can outperform graph convolutional networks in domains where relations between nodes are asymmetric and signed.
arXiv Detail & Related papers (2020-12-08T01:19:48Z)
A Unifying View on Implicit Bias in Training Linear Neural Networks [31.65006970108761]
We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases.
arXiv Detail & Related papers (2020-10-06T06:08:35Z)
Data-Driven Learning of Geometric Scattering Networks [74.3283600072357]
We propose a new graph neural network (GNN) module based on relaxations of recently proposed geometric scattering transforms. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations.
arXiv Detail & Related papers (2020-10-06T01:20:27Z)
On the linearity of large non-linear models: when and why the tangent kernel is constant [20.44438519046223]
We shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network.
arXiv Detail & Related papers (2020-10-02T16:44:45Z)
Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs [81.12344211998635]
A common approach to define convolutions on meshes is to interpret them as a graph and apply graph convolutional networks (GCNs) We propose Gauge Equivariant Mesh CNNs which generalize GCNs to apply anisotropic gauge equivariant kernels. Our experiments validate the significantly improved expressivity of the proposed model over conventional GCNs and other methods.
arXiv Detail & Related papers (2020-03-11T17:21:15Z)
On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width [99.24399270311069]
We observe that for wider networks, minimizing the loss with the descent optimization maneuvers through surfaces of positive curvatures at the start and end of training, and close to zero curvatures in between. In other words, it seems that during crucial parts of the training process, the Hessian in wide networks is dominated by the component G.
arXiv Detail & Related papers (2020-01-14T16:30:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.