LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
- URL: http://arxiv.org/abs/2305.04536v2
- Date: Tue, 18 Jun 2024 06:45:57 GMT
- Title: LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
- Authors: Peng Xia, Di Xu, Ming Hu, Lie Ju, Zongyuan Ge,
- Abstract summary: We propose a unified framework for LTML, namely prompt tuning with class-specific embedding loss (LMPT)
Our method significantly surpasses the previous state-of-the-art methods and zero-shot CLIP in LTML.
- Score: 12.62835357920401
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Long-tailed multi-label visual recognition (LTML) task is a highly challenging task due to the label co-occurrence and imbalanced data distribution. In this work, we propose a unified framework for LTML, namely prompt tuning with class-specific embedding loss (LMPT), capturing the semantic feature interactions between categories by combining text and image modality data and improving the performance synchronously on both head and tail classes. Specifically, LMPT introduces the embedding loss function with class-aware soft margin and re-weighting to learn class-specific contexts with the benefit of textual descriptions (captions), which could help establish semantic relationships between classes, especially between the head and tail classes. Furthermore, taking into account the class imbalance, the distribution-balanced loss is adopted as the classification loss function to further improve the performance on the tail classes without compromising head classes. Extensive experiments are conducted on VOC-LT and COCO-LT datasets, which demonstrates that our method significantly surpasses the previous state-of-the-art methods and zero-shot CLIP in LTML. Our codes are fully public at https://github.com/richard-peng-xia/LMPT.
Related papers
- SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification [8.139529179222844]
Category-Prompt Refined Feature Learning (CPRFL) is a novel approach for Long-Tailed Multi-Label image Classification.
CPRFL initializes category-prompts from the pretrained CLIP's embeddings and decouples category-specific visual representations.
We validate the effectiveness of our method on two LTMLC benchmarks and extensive experiments demonstrate the superiority of our work over baselines.
arXiv Detail & Related papers (2024-08-15T12:51:57Z) - SFC: Shared Feature Calibration in Weakly Supervised Semantic
Segmentation [28.846513129022803]
Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost.
Existing methods mainly rely on Class Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models.
In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes.
arXiv Detail & Related papers (2024-01-22T06:43:13Z) - Long-Tailed Classification Based on Coarse-Grained Leading Forest and Multi-Center Loss [20.10399273585125]
Long-tailed (LT) classification is an unavoidable and challenging problem in the real world.
We propose a novel long-tailed classification framework, aiming to build a multi-granularity classification model by means of invariant feature learning.
Our approach achieves state-of-the-art performance on both existing benchmarks ImageNet-GLT and MSCOCO-GLT.
arXiv Detail & Related papers (2023-10-12T10:51:23Z) - Learning in Imperfect Environment: Multi-Label Classification with
Long-Tailed Distribution and Partial Labels [53.68653940062605]
We introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC)
We find that most LT-MLC and PL-MLC approaches fail to solve the degradation-MLC.
We propose an end-to-end learning framework: textbfCOrrection $rightarrow$ textbfModificattextbfIon $rightarrow$ balantextbfCe.
arXiv Detail & Related papers (2023-04-20T20:05:08Z) - Distinguishability Calibration to In-Context Learning [31.375797763897104]
We propose a method to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings.
We also take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding.
arXiv Detail & Related papers (2023-02-13T09:15:00Z) - Class-Incremental Lifelong Learning in Multi-Label Classification [3.711485819097916]
This paper studies Lifelong Multi-Label (LML) classification, which builds an online class-incremental classifier in a sequential multi-label classification data stream.
To solve the problem, the study proposes an Augmented Graph Convolutional Network (AGCN) with a built Augmented Correlation Matrix (ACM) across sequential partial-label tasks.
arXiv Detail & Related papers (2022-07-16T05:14:07Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - Feature Space Augmentation for Long-Tailed Data [74.65615132238291]
Real-world data often follow a long-tailed distribution as the frequency of each class is typically different.
Class-balanced loss and advanced methods on data re-sampling and augmentation are among the best practices to alleviate the data imbalance problem.
We present a novel approach to address the long-tailed problem by augmenting the under-represented classes in the feature space with the features learned from the classes with ample samples.
arXiv Detail & Related papers (2020-08-09T06:38:00Z) - Boosting Few-Shot Learning With Adaptive Margin Loss [109.03665126222619]
This paper proposes an adaptive margin principle to improve the generalization ability of metric-based meta-learning approaches for few-shot learning problems.
Extensive experiments demonstrate that the proposed method can boost the performance of current metric-based meta-learning approaches.
arXiv Detail & Related papers (2020-05-28T07:58:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.