Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
- URL: http://arxiv.org/abs/2401.03870v1
- Date: Mon, 8 Jan 2024 13:01:54 GMT
- Title: Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
- Authors: Hui Lin and Zhiheng Ma and Xiaopeng Hong and Qinnan Shangguan and Deyu
Meng
- Abstract summary: Gramformer is a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively.
A feature-based encoding is proposed to discover the centrality positions or importance of nodes.
Experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method.
- Score: 68.26599222077466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer has been popular in recent crowd counting work since it breaks
the limited receptive field of traditional CNNs. However, since crowd images
always contain a large number of similar patches, the self-attention mechanism
in Transformer tends to find a homogenized solution where the attention maps of
almost all patches are identical. In this paper, we address this problem by
proposing Gramformer: a graph-modulated transformer to enhance the network by
adjusting the attention and input node features respectively on the basis of
two different types of graphs. Firstly, an attention graph is proposed to
diverse attention maps to attend to complementary information. The graph is
building upon the dissimilarities between patches, modulating the attention in
an anti-similarity fashion. Secondly, a feature-based centrality encoding is
proposed to discover the centrality positions or importance of nodes. We encode
them with a proposed centrality indices scheme to modulate the node features
and similarity relationships. Extensive experiments on four challenging crowd
counting datasets have validated the competitiveness of the proposed method.
Code is available at {https://github.com/LoraLinH/Gramformer}.
Related papers
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I.
InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling.
A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z) - Graph as Point Set [31.448841287258116]
This paper introduces a novel graph-to-set conversion method that transforms interconnected nodes into a set of independent points.
It enables using set encoders to learn from graphs, thereby significantly expanding the design space of Graph Neural Networks.
To demonstrate the effectiveness of our approach, we introduce Point Set Transformer (PST), a transformer architecture that accepts a point set converted from a graph as input.
arXiv Detail & Related papers (2024-05-05T02:29:41Z) - GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets [1.1586742546971471]
We propose a Graph-based Vision Transformer (GvT) that utilizes graph convolutional projection and graph-pooling.
GvT produces comparable or superior outcomes to deep convolutional networks without pre-training on large datasets.
arXiv Detail & Related papers (2024-04-07T11:48:07Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Graph Reasoning Transformer for Image Parsing [67.76633142645284]
We propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern.
Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.
arXiv Detail & Related papers (2022-09-20T08:21:37Z) - Gransformer: Transformer-based Graph Generation [14.161975556325796]
Gransformer is an algorithm based on Transformer for generating graphs.
We modify the Transformer encoder to exploit the structural information of the given graph.
We also introduce a graph-based familiarity measure between node pairs.
arXiv Detail & Related papers (2022-03-25T14:05:12Z) - A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model.
The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z) - AEGCN: An Autoencoder-Constrained Graph Convolutional Network [5.023274927781062]
We propose a novel neural network architecture, called autoencoder-constrained graph convolutional network.
The core of this model is a convolutional network operating directly on graphs, whose hidden layers are constrained by an autoencoder.
We show that adding autoencoder constraints significantly improves the performance of graph convolutional networks.
arXiv Detail & Related papers (2020-07-03T16:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.