Spatial-context-aware deep neural network for multi-class image
classification
- URL: http://arxiv.org/abs/2111.12296v1
- Date: Wed, 24 Nov 2021 06:36:10 GMT
- Title: Spatial-context-aware deep neural network for multi-class image
classification
- Authors: Jialu Zhang, Qian Zhang, Jianfeng Ren, Yitian Zhao, Jiang Liu
- Abstract summary: A spatial-context-aware deep neural network is proposed to predict labels taking into account both semantic and spatial information.
This proposed framework is evaluated on Microsoft COCO and PASCAL VOC, two widely used benchmark datasets for image multi-labelling.
- Score: 12.961070515143161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label image classification is a fundamental but challenging task in
computer vision. Over the past few decades, solutions exploring relationships
between semantic labels have made great progress. However, the underlying
spatial-contextual information of labels is under-exploited. To tackle this
problem, a spatial-context-aware deep neural network is proposed to predict
labels taking into account both semantic and spatial information. This proposed
framework is evaluated on Microsoft COCO and PASCAL VOC, two widely used
benchmark datasets for image multi-labelling. The results show that the
proposed approach is superior to the state-of-the-art solutions on dealing with
the multi-label image classification problem.
Related papers
- Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey [49.47197748663787]
This review aims to provide a first comprehensive and organized overview of the state-of-the-art research results on pseudo-label methods in the field of semi-supervised semantic segmentation.
In addition, we explore the application of pseudo-label technology in medical and remote-sensing image segmentation.
arXiv Detail & Related papers (2024-03-04T10:18:38Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - Multi-layered Semantic Representation Network for Multi-label Image
Classification [8.17894017454724]
Multi-label image classification (MLIC) is a fundamental and practical task, which aims to assign multiple possible labels to an image.
In recent years, many deep convolutional neural network (CNN) based approaches have been proposed which model label correlations.
This paper advances this research direction by improving the modeling of label correlations and the learning of semantic representations.
arXiv Detail & Related papers (2021-06-22T08:04:22Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive
Person Re-Identification [25.96221714337815]
Domain adaptive person re-identification (re-ID) is a challenging task due to the large discrepancy between the source domain and the target domain.
Existing methods mainly attempt to generate pseudo labels for unlabeled target images by clustering algorithms.
We propose a Self-Supervised Knowledge Distillation (SSKD) technique containing two modules, the identity learning and the soft label learning.
arXiv Detail & Related papers (2020-09-13T10:12:02Z) - Reconstruction Regularized Deep Metric Learning for Multi-label Image
Classification [39.055689258395624]
We present a novel deep metric learning method to tackle the multi-label image classification problem.
Our model can be trained in an end-to-end manner.
arXiv Detail & Related papers (2020-07-27T13:28:50Z) - Adversarial Learning for Personalized Tag Recommendation [61.76193196463919]
We propose an end-to-end deep network which can be trained on large-scale datasets.
A joint training of user-preference and visual encoding allows the network to efficiently integrate the visual preference with tagging behavior.
We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets.
arXiv Detail & Related papers (2020-04-01T20:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.