Optimizing Classification of Infrequent Labels by Reducing Variability in Label Distribution
- URL: http://arxiv.org/abs/2511.07459v1
- Date: Wed, 12 Nov 2025 01:00:55 GMT
- Title: Optimizing Classification of Infrequent Labels by Reducing Variability in Label Distribution
- Authors: Ashutosh Agarwal,
- Abstract summary: LEVER is designed to address the challenges posed by underperforming infrequent categories in Extreme Classification (XC) tasks.<n>Infrequent categories, often characterized by sparse samples, suffer from high label inconsistency, which undermines classification performance.<n>LEVER mitigates this problem by adopting a robust Siamese-style architecture, leveraging knowledge transfer to reduce label inconsistency.<n> Comprehensive testing across multiple XC datasets reveals substantial improvements in the handling of infrequent categories.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper presents a novel solution, LEVER, designed to address the challenges posed by underperforming infrequent categories in Extreme Classification (XC) tasks. Infrequent categories, often characterized by sparse samples, suffer from high label inconsistency, which undermines classification performance. LEVER mitigates this problem by adopting a robust Siamese-style architecture, leveraging knowledge transfer to reduce label inconsistency and enhance the performance of One-vs-All classifiers. Comprehensive testing across multiple XC datasets reveals substantial improvements in the handling of infrequent categories, setting a new benchmark for the field. Additionally, the paper introduces two newly created multi-intent datasets, offering essential resources for future XC research.
Related papers
- Consistent Supervised-Unsupervised Alignment for Generalized Category Discovery [49.67913741459179]
Generalized Category Discovery (GCD) focuses on classifying known categories while simultaneously discovering novel categories from unlabeled data.<n>Previous GCD methods face challenges due to inconsistent optimization objectives and category confusion.<n>We propose the Neural Collapse-inspired Generalized Category Discovery (NC-GCD) framework.
arXiv Detail & Related papers (2025-07-07T07:34:41Z) - Active Generalized Category Discovery [60.69060965936214]
Generalized Category Discovery (GCD) endeavors to cluster unlabeled samples from both novel and old classes.
We take the spirit of active learning and propose a new setting called Active Generalized Category Discovery (AGCD)
Our method achieves state-of-the-art performance on both generic and fine-grained datasets.
arXiv Detail & Related papers (2024-03-07T07:12:24Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - Retrieval-augmented Multi-label Text Classification [20.100081284294973]
Multi-label text classification is a challenging task in settings of large label sets.
Retrieval augmentation aims to improve the sample efficiency of classification models.
We evaluate this approach on four datasets from the legal and biomedical domains.
arXiv Detail & Related papers (2023-05-22T14:16:23Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Long-tailed Extreme Multi-label Text Classification with Generated
Pseudo Label Descriptions [28.416742933744942]
This paper addresses the challenge of tail label prediction by proposing a novel approach.
It combines the effectiveness of a trained bag-of-words (BoW) classifier in generating informative label descriptions under severe data scarce conditions.
The proposed approach achieves state-of-the-art performance on XMTC benchmark datasets and significantly outperforms the best methods so far in the tail label prediction.
arXiv Detail & Related papers (2022-04-02T23:42:32Z) - Coherent Hierarchical Multi-Label Classification Networks [56.41950277906307]
C-HMCNN(h) is a novel approach for HMC problems, which exploits hierarchy information in order to produce predictions coherent with the constraint and improve performance.
We conduct an extensive experimental analysis showing the superior performance of C-HMCNN(h) when compared to state-of-the-art models.
arXiv Detail & Related papers (2020-10-20T09:37:02Z) - On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation [72.062661402124]
We present a novel training framework to jointly target PU classification and conditional generation when exposed to extra data.<n>We prove the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets.
arXiv Detail & Related papers (2020-06-14T08:27:40Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.