Are GATs Out of Balance?
- URL: http://arxiv.org/abs/2310.07235v2
- Date: Wed, 25 Oct 2023 15:49:30 GMT
- Title: Are GATs Out of Balance?
- Authors: Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz
- Abstract summary: We study the Graph Attention Network (GAT) in which a node's neighborhood aggregation is weighted by parameterized attention coefficients.
Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
- Score: 73.2500577189791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the expressive power and computational capabilities of graph neural
networks (GNNs) have been theoretically studied, their optimization and
learning dynamics, in general, remain largely unexplored. Our study undertakes
the Graph Attention Network (GAT), a popular GNN architecture in which a node's
neighborhood aggregation is weighted by parameterized attention coefficients.
We derive a conservation law of GAT gradient flow dynamics, which explains why
a high portion of parameters in GATs with standard initialization struggle to
change during training. This effect is amplified in deeper GATs, which perform
significantly worse than their shallow counterparts. To alleviate this problem,
we devise an initialization scheme that balances the GAT network. Our approach
i) allows more effective propagation of gradients and in turn enables
trainability of deeper networks, and ii) attains a considerable speedup in
training and convergence time in comparison to the standard initialization. Our
main theorem serves as a stepping stone to studying the learning dynamics of
positive homogeneous models with attention mechanisms.
Related papers
- Attentional Graph Neural Networks for Robust Massive Network
Localization [20.416879207269446]
Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning.
This paper integrates GNNs with attention mechanism to tackle a challenging nonlinear regression problem: network localization.
We first introduce a novel network localization method based on graph convolutional network (GCN), which exhibits exceptional precision even under severe non-line-of-sight (NLOS) conditions.
arXiv Detail & Related papers (2023-11-28T15:05:13Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - FairGAT: Fairness-aware Graph Attention Networks [9.492903649862761]
Graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks.
The influence of the attention design in GATs on algorithmic bias has not been investigated.
A novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed.
arXiv Detail & Related papers (2023-03-26T00:10:20Z) - Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks.
We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z) - SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks.
We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z) - Towards Understanding Graph Neural Networks: An Algorithm Unrolling
Perspective [9.426760895586428]
We introduce a class of unrolled networks built on truncated optimization algorithms for graph signal denoising problems.
The training process of a GNN model can be seen as solving a bilevel optimization problem with a GSD problem at the lower level.
An expressive model named UGDGNN, i.e., unrolled gradient descent GNN, is proposed which inherits appealing theoretical properties.
arXiv Detail & Related papers (2022-06-09T12:54:03Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network [75.1236305913734]
We investigate the dynamics-aware adversarial attack problem in deep neural networks.
Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process.
We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
arXiv Detail & Related papers (2021-12-17T10:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.