Related papers: Boosting Multi-Label Image Classification with Complementary Parallel Self-Distillation

Boosting Multi-Label Image Classification with Complementary Parallel Self-Distillation

URL: http://arxiv.org/abs/2205.10986v1
Date: Mon, 23 May 2022 01:28:38 GMT
Title: Boosting Multi-Label Image Classification with Complementary Parallel Self-Distillation
Authors: Jiazhi Xu and Sheng Huang and Fengtao Zhou and Luwen Huangfu and Daniel Zeng and Bo Liu
Abstract summary: Multi-Label Image Classification approaches usually exploit label correlations to achieve good performance. emphasizing correlation like co-occurrence may overlook discriminative features of the target itself and lead to model overfitting. In this study, we propose a generic framework named Parallel Self-Distillation (PSD) for boosting MLIC models.
Score: 15.518137695660668
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multi-Label Image Classification (MLIC) approaches usually exploit label correlations to achieve good performance. However, emphasizing correlation like co-occurrence may overlook discriminative features of the target itself and lead to model overfitting, thus undermining the performance. In this study, we propose a generic framework named Parallel Self-Distillation (PSD) for boosting MLIC models. PSD decomposes the original MLIC task into several simpler MLIC sub-tasks via two elaborated complementary task decomposition strategies named Co-occurrence Graph Partition (CGP) and Dis-occurrence Graph Partition (DGP). Then, the MLIC models of fewer categories are trained with these sub-tasks in parallel for respectively learning the joint patterns and the category-specific patterns of labels. Finally, knowledge distillation is leveraged to learn a compact global ensemble of full categories with these learned patterns for reconciling the label correlation exploitation and model overfitting. Extensive results on MS-COCO and NUS-WIDE datasets demonstrate that our framework can be easily plugged into many MLIC approaches and improve performances of recent state-of-the-art approaches. The explainable visual study also further validates that our method is able to learn both the category-specific and co-occurring features. The source code is released at https://github.com/Robbie-Xu/CPSD.

Related papers

Dual-View Alignment Learning with Hierarchical-Prompt for Class-Imbalance Multi-Label Classification [45.76234309840256]
Real-world datasets often exhibit class imbalance across multiple categories, manifesting as long-tailed distributions and few-shot scenarios.<n>This is especially challenging in Class-Imbalanced Multi-Label Image Classification (CI-MLIC) tasks, where data imbalance and multi-object recognition present significant obstacles.<n>We propose a novel method termed Dual-View Alignment Learning with Hierarchical Prompt (HP-DVAL) to mitigate the class-imbalance problem in multi-label settings.
arXiv Detail & Related papers (2025-09-22T13:11:12Z)
Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning. Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning. We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z)
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning [12.052388861361937]
Recent studies have overemphasized co-occurrence relationships among labels, leading to suboptimal models. We propose the Multi-Label Visual Prompt Tuning framework to balance correlative and discriminative relationships among labels. Our proposed approach achieves competitive results and outperforms SOTA methods on multiple pre-trained models.
arXiv Detail & Related papers (2025-04-14T08:52:50Z)
Semantic-guided Representation Learning for Multi-Label Recognition [13.046479112800608]
Multi-label Recognition (MLR) involves assigning multiple labels to each data instance in an image. Recent Vision and Language Pre-training methods have made significant progress in tackling zero-shot MLR tasks. We introduce a Semantic-guided Representation Learning approach (SigRL) that enables the model to learn effective visual and textual representations.
arXiv Detail & Related papers (2025-04-04T08:15:08Z)
Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation [53.723234136550055]
We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML) We propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it. Our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data.
arXiv Detail & Related papers (2025-03-07T07:22:42Z)
Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs) We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Investigating Self-Supervised Methods for Label-Efficient Learning [27.029542823306866]
We study different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities. We introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks. When testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.
arXiv Detail & Related papers (2024-06-25T10:56:03Z)
Improving Multi-label Recognition using Class Co-Occurrence Probabilities [7.062238472483738]
Multi-label Recognition (MLR) involves the identification of multiple objects within an image. Recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs.
arXiv Detail & Related papers (2024-04-24T20:33:25Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation. We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition [18.75994345925282]
Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities. The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data. This paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task.
arXiv Detail & Related papers (2023-12-15T20:58:05Z)
Class-level Structural Relation Modelling and Smoothing for Visual Representation Learning [12.247343963572732]
This paper presents a framework termed bfClass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS) It includes the Class-level Relation Modelling, Class-aware GraphGuided Sampling, and Graph-Guided Representation Learning modules. Experiments demonstrate the effectiveness of structured knowledge modelling for enhanced representation learning and show that CSRMS can be incorporated with any state-of-the-art visual representation learning models for performance gains.
arXiv Detail & Related papers (2023-08-08T09:03:46Z)
Hybrid Relation Guided Set Matching for Few-shot Action Recognition [51.3308583226322]
We propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components. The purpose of the hybrid relation module is to learn task-specific embeddings by fully exploiting associated relations within and cross videos in an episode. We evaluate HyRSM on six challenging benchmarks, and the experimental results show its superiority over the state-of-the-art methods by a convincing margin.
arXiv Detail & Related papers (2022-04-28T11:43:41Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
AGCN: Augmented Graph Convolutional Network for Lifelong Multi-label Image Recognition [8.06616666179388]
The Lifelong Multi-Label (LML) image recognition builds an online class-incremental classifier in a sequential multi-label image recognition data stream. The key challenges of LML image recognition are the construction of label relationships on Partial Labels of training data and the Catastrophic Forgetting on old classes. We propose an Augmented Graph Convolutional Network (AGCN) model that can construct the label relationships across the sequential recognition tasks.
arXiv Detail & Related papers (2022-03-10T18:37:56Z)
Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval. We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions. Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z)
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator. We can employ the meta-learning technique to optimize this label generator. The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.