Related papers: Vision GNN: An Image is Worth Graph of Nodes

Vision GNN: An Image is Worth Graph of Nodes

URL: http://arxiv.org/abs/2206.00272v1
Date: Wed, 1 Jun 2022 07:01:04 GMT
Title: Vision GNN: An Image is Worth Graph of Nodes
Authors: Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu
Abstract summary: We propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture.
Score: 49.3335689216822
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Network architecture plays a key role in the deep learning-based computer vision system. The widely-used convolutional neural network and transformer treat the image as a grid or sequence structure, which is not flexible to capture irregular and complex objects. In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. We first split the image to a number of patches which are viewed as nodes, and construct a graph by connecting the nearest neighbors. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. ViG consists of two basic modules: Grapher module with graph convolution for aggregating and updating graph information, and FFN module with two linear layers for node feature transformation. Both isotropic and pyramid architectures of ViG are built with different model sizes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture. We hope this pioneering study of GNN on general visual tasks will provide useful inspiration and experience for future research. The PyTroch code will be available at https://github.com/huawei-noah/CV-Backbones and the MindSpore code will be avaiable at https://gitee.com/mindspore/models.

Related papers

GraphBridge: Towards Arbitrary Transfer Learning in GNNs [65.01790632978962]
GraphBridge is a novel framework to enable knowledge transfer across disparate tasks and domains in GNNs. It allows for the augmentation of any pre-trained GNN with prediction heads and a bridging network that connects the input to the output layer. Empirical validation, conducted over 16 datasets representative of these scenarios, confirms the framework's capacity for task- and domain-agnostic transfer learning.
arXiv Detail & Related papers (2025-02-26T15:57:51Z)
ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning [7.325055402812975]
Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have dominated the field of Computer Vision (CV) Recent works addressing this bottleneck impose constraints on the flexibility of GNNs to build unstructured graphs. We propose a novel method called Dynamic Efficient Graph Convolution (DEGC) for designing efficient and globally aware ViGs.
arXiv Detail & Related papers (2025-01-18T02:59:10Z)
GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network [7.711922592226936]
We introduce an innovative adaptive graph construction method that utilizes a filtering mechanism based on distance and dynamic threshold similarity. We also combine the global awareness capabilities of Transformers to enhance the model's representation of graph structures. Our system achieves an average improvement of 3.8x-40.3x in overall matching performance.
arXiv Detail & Related papers (2024-12-24T07:05:55Z)
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers [0.0]
Vision Transformers (ViTs) have redefined image classification by leveraging self-attention to capture complex patterns and long-range dependencies between image patches. A key challenge for ViTs is efficiently incorporating multi-scale feature representations, which is inherent in convolutional neural networks (CNNs) through their hierarchical structure. We propose SAG-ViT, a Scale-Aware Graph Attention ViT that integrates multi-scale feature capabilities of CNNs, representational power of ViTs, graph-attended patching to enable richer contextual representation.
arXiv Detail & Related papers (2024-11-14T13:15:27Z)
InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling. A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z)
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition [37.02054260449195]
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image. We present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet) Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs.
arXiv Detail & Related papers (2023-08-28T07:50:04Z)
UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node Classification [6.977634174845066]
A universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder. The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features. The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix.
arXiv Detail & Related papers (2023-08-03T09:32:50Z)
ViG-UNet: Vision Graph Neural Networks for Medical Image Segmentation [7.802846775068384]
We propose a graph neural network-based U-shaped architecture with the encoder, the decoder, the bottleneck, and skip connections. The experimental results on ISIC 2016, ISIC 2017 and Kvasir-SEG datasets demonstrate that our proposed architecture outperforms most existing classic and state-of-the-art U-shaped networks.
arXiv Detail & Related papers (2023-06-08T03:17:00Z)
Graph Neural Networks in Vision-Language Image Understanding: A Survey [6.813036707969848]
2D image understanding is a complex problem within computer vision. It holds the key to providing human-level scene comprehension. In recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines.
arXiv Detail & Related papers (2023-03-07T09:56:23Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Graph Neural Networks with Learnable Structural and Positional Representations [83.24058411666483]
A major issue with arbitrary graphs is the absence of canonical positional information of nodes. We introduce Positional nodes (PE) of nodes, and inject it into the input layer, like in Transformers. We observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes.
arXiv Detail & Related papers (2021-10-15T05:59:15Z)
GraphSVX: Shapley Value Explanations for Graph Neural Networks [81.83769974301995]
Graph Neural Networks (GNNs) achieve significant performance for various learning tasks on geometric data. In this paper, we propose a unified framework satisfied by most existing GNN explainers. We introduce GraphSVX, a post hoc local model-agnostic explanation method specifically designed for GNNs.
arXiv Detail & Related papers (2021-04-18T10:40:37Z)
Graph Contrastive Learning with Augmentations [109.23158429991298]
We propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We show that our framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-22T20:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.