Related papers: Are GATs Out of Balance?

Are GATs Out of Balance?

URL: http://arxiv.org/abs/2310.07235v2
Date: Wed, 25 Oct 2023 15:49:30 GMT
Title: Are GATs Out of Balance?
Authors: Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz
Abstract summary: We study the Graph Attention Network (GAT) in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
Score: 73.2500577189791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.

Related papers

Attentional Graph Neural Networks for Robust Massive Network Localization [20.416879207269446]
Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning. This paper integrates GNNs with attention mechanism to tackle a challenging nonlinear regression problem: network localization. We first introduce a novel network localization method based on graph convolutional network (GCN), which exhibits exceptional precision even under severe non-line-of-sight (NLOS) conditions.
arXiv Detail & Related papers (2023-11-28T15:05:13Z)
Label Deconvolution for Node Representation Learning on Large-scale Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
FairGAT: Fairness-aware Graph Attention Networks [9.492903649862761]
Graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks. The influence of the attention design in GATs on algorithmic bias has not been investigated. A novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed.
arXiv Detail & Related papers (2023-03-26T00:10:20Z)
Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient. Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z)
SGD with Large Step Sizes Learns Sparse Features [22.959258640051342]
We showcase important features of the dynamics of the Gradient Descent (SGD) in the training of neural networks. We show that the longer large step sizes keep SGD high in the loss landscape, the better the implicit regularization can operate and find sparse representations.
arXiv Detail & Related papers (2022-10-11T11:00:04Z)
Towards Understanding Graph Neural Networks: An Algorithm Unrolling Perspective [9.426760895586428]
We introduce a class of unrolled networks built on truncated optimization algorithms for graph signal denoising problems. The training process of a GNN model can be seen as solving a bilevel optimization problem with a GSD problem at the lower level. An expressive model named UGDGNN, i.e., unrolled gradient descent GNN, is proposed which inherits appealing theoretical properties.
arXiv Detail & Related papers (2022-06-09T12:54:03Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network [75.1236305913734]
We investigate the dynamics-aware adversarial attack problem in deep neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
arXiv Detail & Related papers (2021-12-17T10:53:35Z)
A Deep Graph Neural Networks Architecture Design: From Global Pyramid-like Shrinkage Skeleton to Local Topology Link Rewiring [1.455240131708017]
We propose a three-pipeline training framework based on critical expressivity, including global model contraction, weight evolution, and link's weight rewiring. We analyze the reason for the modularity (clustering) phenomenon in network topology and use it to rewire potential erroneous weighted links. The architecture design on GNNs, in turn, verifies the expressivity of GNNs from dynamics and topological space aspects.
arXiv Detail & Related papers (2020-12-16T03:14:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.