Gradient Derivation for Learnable Parameters in Graph Attention Networks
- URL: http://arxiv.org/abs/2304.10939v1
- Date: Fri, 21 Apr 2023 13:23:38 GMT
- Title: Gradient Derivation for Learnable Parameters in Graph Attention Networks
- Authors: Marion Neumeier, Andreas Tollk\"uhn, Sebastian Dorn, Michael Botsch,
Wolfgang Utschick
- Abstract summary: This work provides a comprehensive derivation of the parameter gradients for GATv2 [4], a widely used implementation of Graph Attention Networks (GATs)
As the gradient flow provides valuable insights into the training dynamics of statistically learning models, this work obtains the gradients for the trainable model parameters of GATv2.
- Score: 11.581071131903775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work provides a comprehensive derivation of the parameter gradients for
GATv2 [4], a widely used implementation of Graph Attention Networks (GATs).
GATs have proven to be powerful frameworks for processing graph-structured data
and, hence, have been used in a range of applications. However, the achieved
performance by these attempts has been found to be inconsistent across
different datasets and the reasons for this remains an open research question.
As the gradient flow provides valuable insights into the training dynamics of
statistically learning models, this work obtains the gradients for the
trainable model parameters of GATv2. The gradient derivations supplement the
efforts of [2], where potential pitfalls of GATv2 are investigated.
Related papers
- Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm [56.06235614890066]
Gradient descent (GD) and gradient descent (SGD) have been widely used in a number of application domains.
This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow.
arXiv Detail & Related papers (2024-09-10T14:15:56Z) - Graph neural network surrogate for strategic transport planning [2.175217022338634]
This paper explores the application of advanced Graph Neural Network (GNN) architectures as surrogate models for strategic transport planning.
Building upon a prior work that laid the foundation with graph convolution networks (GCN), our study delves into the comparative analysis of established GCN with the more expressive Graph Attention Network (GAT)
We propose a novel GAT variant (namely GATv3) to address over-smoothing issues in graph-based models.
arXiv Detail & Related papers (2024-08-14T14:18:47Z) - The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents [28.102697445508976]
We investigate the training dynamics of two-layer neural networks when learning multi-index target functions.
We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable.
We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property.
arXiv Detail & Related papers (2024-02-05T17:30:42Z) - Are GATs Out of Balance? [73.2500577189791]
We study the Graph Attention Network (GAT) in which a node's neighborhood aggregation is weighted by parameterized attention coefficients.
Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
arXiv Detail & Related papers (2023-10-11T06:53:05Z) - Optimization and Interpretability of Graph Attention Networks for Small
Sparse Graph Structures in Automotive Applications [11.581071131903775]
This work aims for a better understanding of the attention mechanism and analyzes its interpretability of identifying causal importance.
For automotive applications, the Graph Attention Network (GAT) is a prominently used architecture to include relational information of a traffic scenario during feature embedding.
arXiv Detail & Related papers (2023-05-25T15:55:59Z) - Adaptive Depth Graph Attention Networks [19.673509341792606]
The graph attention networks (GAT) is considered the most advanced learning architecture for graph representation.
We find that the main factor limiting the accuracy of the GAT model as the number of layers increases is the oversquashing phenomenon.
We propose a GAT variant model-ADGAT that adaptively selects the number of layers based on the sparsity of the graph.
arXiv Detail & Related papers (2023-01-16T05:22:29Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Deep Manifold Learning with Graph Mining [80.84145791017968]
We propose a novel graph deep model with a non-gradient decision layer for graph mining.
The proposed model has achieved state-of-the-art performance compared to the current models.
arXiv Detail & Related papers (2022-07-18T04:34:08Z) - On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient.
Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.