Relieving the Over-Aggregating Effect in Graph Transformers
- URL: http://arxiv.org/abs/2510.21267v1
- Date: Fri, 24 Oct 2025 08:56:48 GMT
- Title: Relieving the Over-Aggregating Effect in Graph Transformers
- Authors: Junshu Sun, Wanxing Chang, Chenxue Yang, Qingming Huang, Shuhui Wang,
- Abstract summary: Over-aggregating occurs when a large volume of messages is aggregated into a single node with less discrimination.<n>We propose Wideformer, a plug-and-play method for graph attention.
- Score: 76.05464934030516
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Graph attention has demonstrated superior performance in graph learning tasks. However, learning from global interactions can be challenging due to the large number of nodes. In this paper, we discover a new phenomenon termed over-aggregating. Over-aggregating arises when a large volume of messages is aggregated into a single node with less discrimination, leading to the dilution of the key messages and potential information loss. To address this, we propose Wideformer, a plug-and-play method for graph attention. Wideformer divides the aggregation of all nodes into parallel processes and guides the model to focus on specific subsets of these processes. The division can limit the input volume per aggregation, avoiding message dilution and reducing information loss. The guiding step sorts and weights the aggregation outputs, prioritizing the informative messages. Evaluations show that Wideformer can effectively mitigate over-aggregating. As a result, the backbone methods can focus on the informative messages, achieving superior performance compared to baseline methods.
Related papers
- Beyond Message Passing: Neural Graph Pattern Machine [50.78679002846741]
We introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures.<n>GPM efficiently extracts, encodes, and prioritizes task-relevant graph patterns, offering greater expressivity and improved ability to capture long-range dependencies.
arXiv Detail & Related papers (2025-01-30T20:37:47Z) - Preventing Representational Rank Collapse in MPNNs by Splitting the Computational Graph [9.498398257062641]
We show that operating on multiple directed acyclic graphs always satisfies our condition and propose to obtain these by defining a strict partial ordering of the nodes.<n>We conduct comprehensive experiments that confirm the benefits of operating on multi-relational graphs to achieve more informative node representations.
arXiv Detail & Related papers (2024-09-17T19:16:03Z) - Task-Oriented Communication for Graph Data: A Graph Information Bottleneck Approach [12.451324619122405]
This paper introduces a method to extract a smaller, task-focused subgraph that maintains key information while reducing communication overhead.
Our approach utilizes graph neural networks (GNNs) and the graph information bottleneck (GIB) principle to create a compact, informative, and robust graph representation suitable for transmission.
arXiv Detail & Related papers (2024-09-04T14:01:56Z) - Neighbour-level Message Interaction Encoding for Improved Representation Learning on Graphs [13.83680253264399]
We propose a neighbour-level message interaction information encoding method for improving graph representation learning.
The proposed encoding method is a generic method which can be integrated into message-passing graph convolutional networks.
arXiv Detail & Related papers (2024-04-15T14:07:33Z) - Hierarchical Aggregations for High-Dimensional Multiplex Graph Embedding [7.271256448682229]
HMGE is a novel embedding method based on hierarchical aggregation for high-dimensional multiplex graphs.
We leverage mutual information between local patches and global summaries to train the model without supervision.
Detailed experiments on synthetic and real-world data illustrate the suitability of our approach to downstream supervised tasks.
arXiv Detail & Related papers (2023-12-28T05:39:33Z) - Flood and Echo Net: Algorithmically Aligned GNNs that Generalize [18.95453617434051]
We propose the Flood and Echo Net as a new type of Graph Neural Network.
The mechanism's inherent ability to generalize across graphs of varying sizes positions it as a practical architecture for the task of algorithmic learning.
We test the Flood and Echo Net on a variety of synthetic tasks and the SALSA-CLRS benchmark and find that the algorithmic alignment of the execution improves generalization to larger graph sizes.
arXiv Detail & Related papers (2023-10-10T19:47:58Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Distributed Learning over Networks with Graph-Attention-Based
Personalization [49.90052709285814]
We propose a graph-based personalized algorithm (GATTA) for distributed deep learning.
In particular, the personalized model in each agent is composed of a global part and a node-specific part.
By treating each agent as one node in a graph the node-specific parameters as its features, the benefits of the graph attention mechanism can be inherited.
arXiv Detail & Related papers (2023-05-22T13:48:30Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Multiple Graph Learning for Scalable Multi-view Clustering [26.846642220480863]
We propose an efficient multiple graph learning model via a small number of anchor points and tensor Schatten p-norm minimization.
Specifically, we construct a hidden and tractable large graph by anchor graph for each view.
We develop an efficient algorithm, which scales linearly with the data size, to solve our proposed model.
arXiv Detail & Related papers (2021-06-29T13:10:56Z) - Text Information Aggregation with Centrality Attention [86.91922440508576]
We propose a new way of obtaining aggregation weights, called eigen-centrality self-attention.
We build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word.
arXiv Detail & Related papers (2020-11-16T13:08:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.