Vision GNN: An Image is Worth Graph of Nodes
- URL: http://arxiv.org/abs/2206.00272v1
- Date: Wed, 1 Jun 2022 07:01:04 GMT
- Title: Vision GNN: An Image is Worth Graph of Nodes
- Authors: Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu
- Abstract summary: We propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.
Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes.
Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture.
- Score: 49.3335689216822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network architecture plays a key role in the deep learning-based computer
vision system. The widely-used convolutional neural network and transformer
treat the image as a grid or sequence structure, which is not flexible to
capture irregular and complex objects. In this paper, we propose to represent
the image as a graph structure and introduce a new Vision GNN (ViG)
architecture to extract graph-level feature for visual tasks. We first split
the image to a number of patches which are viewed as nodes, and construct a
graph by connecting the nearest neighbors. Based on the graph representation of
images, we build our ViG model to transform and exchange information among all
the nodes. ViG consists of two basic modules: Grapher module with graph
convolution for aggregating and updating graph information, and FFN module with
two linear layers for node feature transformation. Both isotropic and pyramid
architectures of ViG are built with different model sizes. Extensive
experiments on image recognition and object detection tasks demonstrate the
superiority of our ViG architecture. We hope this pioneering study of GNN on
general visual tasks will provide useful inspiration and experience for future
research. The PyTroch code will be available at
https://github.com/huawei-noah/CV-Backbones and the MindSpore code will be
avaiable at https://gitee.com/mindspore/models.
Related papers
- ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning [7.325055402812975]
Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have dominated the field of Computer Vision (CV)
Recent works addressing this bottleneck impose constraints on the flexibility of GNNs to build unstructured graphs.
We propose a novel method called Dynamic Efficient Graph Convolution (DEGC) for designing efficient and globally aware ViGs.
arXiv Detail & Related papers (2025-01-18T02:59:10Z) - GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network [7.711922592226936]
We introduce an innovative adaptive graph construction method that utilizes a filtering mechanism based on distance and dynamic threshold similarity.
We also combine the global awareness capabilities of Transformers to enhance the model's representation of graph structures.
Our system achieves an average improvement of 3.8x-40.3x in overall matching performance.
arXiv Detail & Related papers (2024-12-24T07:05:55Z) - SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers [0.0]
Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches.
A key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure.
We propose SAG-ViT, a Scale-Aware Graph Attention ViT that integrates multi-scale feature capabilities of CNNs, representational power of ViTs, graph-attended patching to enable richer contextual representation.
arXiv Detail & Related papers (2024-11-14T13:15:27Z) - InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I.
InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling.
A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z) - UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node
Classification [6.977634174845066]
A universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder.
The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features.
The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix.
arXiv Detail & Related papers (2023-08-03T09:32:50Z) - Graph Neural Networks in Vision-Language Image Understanding: A Survey [6.813036707969848]
2D image understanding is a complex problem within computer vision.
It holds the key to providing human-level scene comprehension.
In recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines.
arXiv Detail & Related papers (2023-03-07T09:56:23Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Graph Neural Networks with Learnable Structural and Positional
Representations [83.24058411666483]
A major issue with arbitrary graphs is the absence of canonical positional information of nodes.
We introduce Positional nodes (PE) of nodes, and inject it into the input layer, like in Transformers.
We observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes.
arXiv Detail & Related papers (2021-10-15T05:59:15Z) - GraphSVX: Shapley Value Explanations for Graph Neural Networks [81.83769974301995]
Graph Neural Networks (GNNs) achieve significant performance for various learning tasks on geometric data.
In this paper, we propose a unified framework satisfied by most existing GNN explainers.
We introduce GraphSVX, a post hoc local model-agnostic explanation method specifically designed for GNNs.
arXiv Detail & Related papers (2021-04-18T10:40:37Z) - Graph Contrastive Learning with Augmentations [109.23158429991298]
We propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data.
We show that our framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-22T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.