GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
- URL: http://arxiv.org/abs/2308.14378v3
- Date: Fri, 19 Jul 2024 02:41:49 GMT
- Title: GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
- Authors: Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu,
- Abstract summary: Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image.
We present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet)
Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs.
- Score: 37.02054260449195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions. Although convolutional neural networks and vision transformers have succeeded in processing images as regular grids of pixels or patches, these representations are sub-optimal for capturing irregular and discontinuous regions of interest. In this work, we present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet), which models the connections between semantic label embeddings and image patches in a flexible and unified graph structure. To address the scale variance of different objects and to capture information from multiple perspectives, we propose the Group KGCN module for dynamic graph construction and message passing. Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs on the challenging multi-label datasets, i.e., MS-COCO and VOC2007 datasets. Codes are available at https://github.com/jin-s13/GKGNet.
Related papers
- SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers [0.0]
We introduce the Scale-Aware Graph Attention Vision Transformer (SAG-ViT), a novel framework that addresses this challenge by integrating multi-scale features.
Using EfficientNet as a backbone, the model extracts multi-scale feature maps, which are divided into patches to preserve semantic information.
The SAG-ViT is evaluated on benchmark datasets, demonstrating its effectiveness in enhancing image classification performance.
arXiv Detail & Related papers (2024-11-14T13:15:27Z) - Two Stream Scene Understanding on Graph Embedding [4.78180589767256]
The paper presents a novel two-stream network architecture for enhancing scene understanding in computer vision.
The graph feature stream network comprises a segmentation structure, scene graph generation, and a graph representation module.
Experiments conducted on the ADE20K dataset demonstrate the effectiveness of the proposed two-stream network in improving image classification accuracy.
arXiv Detail & Related papers (2023-11-12T05:57:56Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Learning Hierarchical Graph Representation for Image Manipulation
Detection [50.04902159383709]
The objective of image manipulation detection is to identify and locate the manipulated regions in the images.
Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images.
We propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches.
arXiv Detail & Related papers (2022-01-15T01:54:25Z) - BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations [89.42397034542189]
We synthesize a large labeled dataset via a generative adversarial network (GAN)
We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes.
We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings.
arXiv Detail & Related papers (2022-01-12T20:28:34Z) - GM-MLIC: Graph Matching based Multi-Label Image Classification [20.118173194957052]
Multi-Label Image Classification (MLIC) aims to predict a set of labels that present in an image.
In this paper, we treat each image as a bag of instances, and reformulate the task of MLIC as an instance-label matching selection problem.
We propose a novel deep learning framework named Graph Matching based Multi-Label Image Classification (GM-MLIC)
arXiv Detail & Related papers (2021-04-30T05:36:25Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Attention-Driven Dynamic Graph Convolutional Network for Multi-Label
Image Recognition [53.17837649440601]
We propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image.
Experiments on public multi-label benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-12-05T10:10:12Z) - Sequential Graph Convolutional Network for Active Learning [53.99104862192055]
We propose a novel pool-based Active Learning framework constructed on a sequential Graph Convolution Network (GCN)
With a small number of randomly sampled images as seed labelled examples, we learn the parameters of the graph to distinguish labelled vs unlabelled nodes.
We exploit these characteristics of GCN to select the unlabelled examples which are sufficiently different from labelled ones.
arXiv Detail & Related papers (2020-06-18T00:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.