Unbiased Loss Functions for Extreme Classification With Missing Labels
- URL: http://arxiv.org/abs/2007.00237v1
- Date: Wed, 1 Jul 2020 04:42:12 GMT
- Title: Unbiased Loss Functions for Extreme Classification With Missing Labels
- Authors: Erik Schultheis, Mohammadreza Qaraei, Priyanshu Gupta, and Rohit
Babbar
- Abstract summary: The goal in extreme multi-label classification (XMC) is to tag an instance with a small subset of relevant labels from an extremely large set of possible labels.
In this work, we derive an unbiased estimator for general formulation of loss functions which decompose over labels.
We show that the derived unbiased estimators can be easily incorporated in state-of-the-art algorithms for extreme classification.
- Score: 1.6011907050002954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal in extreme multi-label classification (XMC) is to tag an instance
with a small subset of relevant labels from an extremely large set of possible
labels. In addition to the computational burden arising from large number of
training instances, features and labels, problems in XMC are faced with two
statistical challenges, (i) large number of 'tail-labels' -- those which occur
very infrequently, and (ii) missing labels as it is virtually impossible to
manually assign every relevant label to an instance. In this work, we derive an
unbiased estimator for general formulation of loss functions which decompose
over labels, and then infer the forms for commonly used loss functions such as
hinge- and squared-hinge-loss and binary cross-entropy loss. We show that the
derived unbiased estimators, in the form of appropriate weighting factors, can
be easily incorporated in state-of-the-art algorithms for extreme
classification, thereby scaling to datasets with hundreds of thousand labels.
However, empirically, we find a slightly altered version that gives more
relative weight to tail labels to perform even better. We suspect is due to the
label imbalance in the dataset, which is not explicitly addressed by our
theoretically derived estimator. Minimizing the proposed loss functions leads
to significant improvement over existing methods (up to 20% in some cases) on
benchmark datasets in XMC.
Related papers
- Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Towards Imbalanced Large Scale Multi-label Classification with Partially
Annotated Labels [8.977819892091]
Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes.
In this work, we address the issue of label imbalance and investigate how to train neural networks using partial labels.
arXiv Detail & Related papers (2023-07-31T21:50:48Z) - Bridging the Gap between Model Explanations in Partially Annotated
Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation.
We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z) - Complementary to Multiple Labels: A Correlation-Aware Correction
Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases.
We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Acknowledging the Unknown for Multi-label Learning with Single Positive
Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML)
We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels.
Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z) - Unbiased Loss Functions for Multilabel Classification with Missing
Labels [2.1549398927094874]
Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks.
This paper derives the unique unbiased estimators for the different multilabel reductions.
arXiv Detail & Related papers (2021-09-23T10:39:02Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments
Latent Variable Estimation [17.212805760360954]
We use a framework centered on model misspecification in method-of-moments latent variable estimation.
We then introduce a correction that provably removes this bias in certain cases.
We observe theoretically and with synthetic experiments that for well-specified models, labeled points are worth a constant factor more than unlabeled points.
arXiv Detail & Related papers (2021-03-03T23:52:38Z) - Label Confusion Learning to Enhance Text Classification Models [3.0251266104313643]
Label Confusion Model (LCM) learns label confusion to capture semantic overlap among labels.
LCM can generate a better label distribution to replace the original one-hot label vector.
experiments on five text classification benchmark datasets reveal the effectiveness of LCM for several widely used deep learning classification models.
arXiv Detail & Related papers (2020-12-09T11:34:35Z) - A Study on the Autoregressive and non-Autoregressive Multi-label
Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly.
Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.