Semantic Embedded Deep Neural Network: A Generic Approach to Boost
Multi-Label Image Classification Performance
- URL: http://arxiv.org/abs/2305.05228v4
- Date: Mon, 5 Jun 2023 21:30:25 GMT
- Title: Semantic Embedded Deep Neural Network: A Generic Approach to Boost
Multi-Label Image Classification Performance
- Authors: Xin Shen, Xiaonan Zhao, Rui Luo
- Abstract summary: We introduce a generic semantic-embedding deep neural network to apply the spatial awareness semantic feature.
We observed an Avg.relative improvement of 15.27% in terms of AUC score across all labels compared to the baseline approach.
- Score: 10.257208600853199
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-grained multi-label classification models have broad applications in
e-commerce, such as visual based label predictions ranging from fashion
attribute detection to brand recognition. One challenge to achieve satisfactory
performance for those classification tasks in real world is the wild visual
background signal that contains irrelevant pixels which confuses model to focus
onto the region of interest and make prediction upon the specific region. In
this paper, we introduce a generic semantic-embedding deep neural network to
apply the spatial awareness semantic feature incorporating a channel-wise
attention based model to leverage the localization guidance to boost model
performance for multi-label prediction. We observed an Avg.relative improvement
of 15.27% in terms of AUC score across all labels compared to the baseline
approach. Core experiment and ablation studies involve multi-label fashion
attribute classification performed on Instagram fashion apparels' image. We
compared the model performances among our approach, baseline approach, and 3
alternative approaches to leverage semantic features. Results show favorable
performance for our approach.
Related papers
- A Deep Features-Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification [1.2434714657059942]
This research develops a fusion of deep learning and machine learning algorithms.
Deep feature-based method for multiclass classification has been used to extract deep features from modified ResNet50.
gradient boosting algorithm has been used to classify photos containing emotional content.
arXiv Detail & Related papers (2024-08-15T04:18:40Z) - Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging [11.70130626541926]
We propose a novel framework for learning cross-modality features to enhance matching and registration across multi-modality retinal images.
Our model draws on the success of previous learning-based feature detection and description methods.
It is trained in a self-supervised manner by enforcing segmentation consistency between different augmentations of the same image.
arXiv Detail & Related papers (2024-07-25T19:51:27Z) - Weakly Supervised Semantic Segmentation by Knowledge Graph Inference [11.056545020611397]
This paper introduces a graph reasoning-based approach to enhance Weakly Supervised Semantic (WSSS)
The aim is to improve WSSS holistically by simultaneously enhancing both the multi-label classification and segmentation network stages.
We have achieved state-of-the-art performance in WSSS on the PASCAL VOC 2012 and MS-COCO datasets.
arXiv Detail & Related papers (2023-09-25T11:50:19Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Domain Adaptive Nuclei Instance Segmentation and Classification via
Category-aware Feature Alignment and Pseudo-labelling [65.40672505658213]
We propose a novel deep neural network, namely Category-Aware feature alignment and Pseudo-Labelling Network (CAPL-Net) for UDA nuclei instance segmentation and classification.
Our approach outperforms state-of-the-art UDA methods with a remarkable margin.
arXiv Detail & Related papers (2022-07-04T07:05:06Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Progressive Co-Attention Network for Fine-grained Visual Classification [20.838908090777885]
Fine-grained visual classification aims to recognize images belonging to multiple sub-categories within a same category.
Most existing methods only take individual image as input.
We propose an effective method called progressive co-attention network (PCA-Net) to tackle this problem.
arXiv Detail & Related papers (2021-01-21T10:19:02Z) - Attention Model Enhanced Network for Classification of Breast Cancer
Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular.
To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch.
Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.