Graph Reasoning Transformer for Image Parsing
- URL: http://arxiv.org/abs/2209.09545v1
- Date: Tue, 20 Sep 2022 08:21:37 GMT
- Title: Graph Reasoning Transformer for Image Parsing
- Authors: Dong Zhang, Jinhui Tang, and Kwang-Ting Cheng
- Abstract summary: We propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern.
Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.
- Score: 67.76633142645284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capturing the long-range dependencies has empirically proven to be effective
on a wide range of computer vision tasks. The progressive advances on this
topic have been made through the employment of the transformer framework with
the help of the multi-head attention mechanism. However, the attention-based
image patch interaction potentially suffers from problems of redundant
interactions of intra-class patches and unoriented interactions of inter-class
patches. In this paper, we propose a novel Graph Reasoning Transformer (GReaT)
for image parsing to enable image patches to interact following a relation
reasoning pattern. Specifically, the linearly embedded image patches are first
projected into the graph space, where each node represents the implicit visual
center for a cluster of image patches and each edge reflects the relation
weight between two adjacent nodes. After that, global relation reasoning is
performed on this graph accordingly. Finally, all nodes including the relation
information are mapped back into the original space for subsequent processes.
Compared to the conventional transformer, GReaT has higher interaction
efficiency and a more purposeful interaction pattern. Experiments are carried
out on the challenging Cityscapes and ADE20K datasets. Results show that GReaT
achieves consistent performance gains with slight computational overheads on
the state-of-the-art transformer baselines.
Related papers
- SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers [0.0]
We introduce the Scale-Aware Graph Attention Vision Transformer (SAG-ViT), a novel framework that addresses this challenge by integrating multi-scale features.
Using EfficientNet as a backbone, the model extracts multi-scale feature maps, which are divided into patches to preserve semantic information.
The SAG-ViT is evaluated on benchmark datasets, demonstrating its effectiveness in enhancing image classification performance.
arXiv Detail & Related papers (2024-11-14T13:15:27Z) - SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Gramformer: Learning Crowd Counting via Graph-Modulated Transformer [68.26599222077466]
Gramformer is a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively.
A feature-based encoding is proposed to discover the centrality positions or importance of nodes.
Experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method.
arXiv Detail & Related papers (2024-01-08T13:01:54Z) - Graph-Segmenter: Graph Transformer with Boundary-aware Attention for
Semantic Segmentation [14.716537714651576]
We propose a Graph-Segmenter, including a Graph Transformer and a Boundary-aware Attention module.
Our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.
arXiv Detail & Related papers (2023-08-15T06:30:19Z) - Graph Transformer GANs for Graph-Constrained House Generation [223.739067413952]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The GTGAN learns effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.
arXiv Detail & Related papers (2023-03-14T20:35:45Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - TFill: Image Completion via a Transformer-Based Architecture [69.62228639870114]
We propose treating image completion as a directionless sequence-to-sequence prediction task.
We employ a restrictive CNN with small and non-overlapping RF for token representation.
In a second phase, to improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced.
arXiv Detail & Related papers (2021-04-02T01:42:01Z) - Relation Transformer Network [25.141472361426818]
We propose a novel transformer formulation for scene graph generation and relation prediction.
We leverage the encoder-decoder architecture of the transformer for rich feature embedding of nodes and edges.
Our relation prediction module classifies the directed relation from the learned node and edge embedding.
arXiv Detail & Related papers (2020-04-13T20:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.