On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems
- URL: http://arxiv.org/abs/2405.01009v2
- Date: Fri, 28 Feb 2025 10:37:29 GMT
- Title: On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems
- Authors: Alessio Gravina, Moshe Eliasof, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schönlieb,
- Abstract summary: A common problem in Message-Passing Neural Networks is oversquashing -- the limited ability to facilitate effective information flow between distant nodes.<n>This paper introduces a novel perspective to address oversquashing, leveraging systems properties of global and local non-dissipativity.<n>We present SWAN, a uniquely parameterized GNN model with antisymmetry both in space and weight domains, as a means to obtain non-dissipativity.
- Score: 28.351050664151536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common problem in Message-Passing Neural Networks is oversquashing -- the limited ability to facilitate effective information flow between distant nodes. Oversquashing is attributed to the exponential decay in information transmission as node distances increase. This paper introduces a novel perspective to address oversquashing, leveraging dynamical systems properties of global and local non-dissipativity, that enable the maintenance of a constant information flow rate. We present SWAN, a uniquely parameterized GNN model with antisymmetry both in space and weight domains, as a means to obtain non-dissipativity. Our theoretical analysis asserts that by implementing these properties, SWAN offers an enhanced ability to transmit information over extended distances. Empirical evaluations on synthetic and real-world benchmarks that emphasize long-range interactions validate the theoretical understanding of SWAN, and its ability to mitigate oversquashing.
Related papers
- Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.
We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.
This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks [57.17129753411926]
Spiking neural networks (SNNs) are emerging as a promising alternative to traditional artificial neural networks (ANNs)
We propose SpikeSR, which achieves state-of-the-art performance across various remote sensing benchmarks such as AID, DOTA, and DIOR.
arXiv Detail & Related papers (2025-03-06T09:06:06Z) - Virtual Nodes Improve Long-term Traffic Prediction [9.125554921271338]
This study introduces a novel framework that incorporates virtual nodes, which are additional nodes added to the graph and connected to existing nodes.
Our proposed model incorporates virtual nodes by constructing a semi-adaptive adjacency matrix.
Experimental results demonstrate that the inclusion of virtual nodes significantly enhances long-term prediction accuracy.
arXiv Detail & Related papers (2025-01-17T09:09:01Z) - A Dynamical Systems-Inspired Pruning Strategy for Addressing Oversmoothing in Graph Neural Networks [18.185834696177654]
Oversmoothing in Graph Neural Networks (GNNs) poses a significant challenge as network depth increases.
We identify the root causes of oversmoothing and propose textbftextitDYNAMO-GAT.
Our theoretical analysis reveals how DYNAMO-GAT disrupts the convergence to oversmoothed states.
arXiv Detail & Related papers (2024-12-10T07:07:06Z) - AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers.
This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z) - Spatially Constrained Transformer with Efficient Global Relation Modelling for Spatio-Temporal Prediction [2.016553603539141]
ST-SampleNet is a transformer-based architecture that combines CNNs with self-attention mechanisms to capture both local and global relations.
Our experimental variant achieves a 40% reduction in computational costs with only a marginal compromise in performance, approximately 1%.
arXiv Detail & Related papers (2024-11-11T10:03:59Z) - Advanced Financial Fraud Detection Using GNN-CL Model [13.5240775562349]
The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection.
It combines the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks.
A key novelty of this paper is the use of multilayer perceptrons (MLPS) to estimate node similarity.
arXiv Detail & Related papers (2024-07-09T03:59:06Z) - Accelerating Scalable Graph Neural Network Inference with Node-Adaptive
Propagation [80.227864832092]
Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications.
The sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs.
We propose an online propagation framework and two novel node-adaptive propagation methods.
arXiv Detail & Related papers (2023-10-17T05:03:00Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Exploring the Complexity of Deep Neural Networks through Functional Equivalence [1.3597551064547502]
We present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced.
We demonstrate that functional equivalence benefits optimization, as over parameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space.
arXiv Detail & Related papers (2023-05-19T04:01:27Z) - Field theory for optimal signal propagation in ResNets [1.053373860696675]
Residual networks have significantly better trainability and performance than feed-forward networks at large depth.
Previous works found that adding a scaling parameter for the residual branch further improves generalization performance.
We derive a systematic finite-size field theory for residual networks to study signal propagation and its dependence on the scaling for the residual branch.
arXiv Detail & Related papers (2023-05-12T18:14:21Z) - Over-parameterised Shallow Neural Networks with Asymmetrical Node
Scaling: Global Convergence Guarantees and Feature Learning [23.47570704524471]
We consider optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter.
We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.
arXiv Detail & Related papers (2023-02-02T10:40:06Z) - Optimized Symbolic Interval Propagation for Neural Network Verification [1.8047694351309207]
We present DPNeurifyFV, a novel branch-and-bound solver for ReLU networks with low dimensional input-space.
We evaluate our approach on the airborne collision avoidance networks ACAS Xu and demonstrate runtime improvements compared to state-of-art tools.
arXiv Detail & Related papers (2022-12-15T14:15:29Z) - Position-aware Structure Learning for Graph Topology-imbalance by
Relieving Under-reaching and Over-squashing [67.83086131278904]
Topology-imbalance is a graph-specific imbalance problem caused by the uneven topology positions of labeled nodes.
We propose a novel position-aware graph structure learning framework named PASTEL.
Our key insight is to enhance the connectivity of nodes within the same class for more supervision information.
arXiv Detail & Related papers (2022-08-17T14:04:21Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - An Entropy-guided Reinforced Partial Convolutional Network for Zero-Shot
Learning [77.72330187258498]
We propose a novel Entropy-guided Reinforced Partial Convolutional Network (ERPCNet)
ERPCNet extracts and aggregates localities based on semantic relevance and visual correlations without human-annotated regions.
It not only discovers global-cooperative localities dynamically but also converges faster for policy gradient optimization.
arXiv Detail & Related papers (2021-11-03T11:13:13Z) - Deep Neural Networks and PIDE discretizations [2.4063592468412276]
We propose neural networks that tackle the problems of stability and field-of-view of a Convolutional Neural Network (CNN)
We propose integral-based spatially nonlocal operators which are related to global weighted Laplacian, fractional Laplacian and fractional inverse Laplacian operators.
We test the effectiveness of the proposed neural architectures on benchmark image classification datasets and semantic segmentation tasks in autonomous driving.
arXiv Detail & Related papers (2021-08-05T08:03:01Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.