sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel
Classification
- URL: http://arxiv.org/abs/2108.10566v1
- Date: Tue, 24 Aug 2021 08:11:33 GMT
- Title: sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel
Classification
- Authors: Gabriel B\'en\'edict, Vincent Koops, Daan Odijk, Maarten de Rijke
- Abstract summary: We propose a loss function, sigmoidF1, to account for the complexity of multilabel classification evaluation.
We show that sigmoidF1 outperforms other loss functions on four datasets and several metrics.
- Score: 42.37189502220329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiclass multilabel classification refers to the task of attributing
multiple labels to examples via predictions. Current models formulate a
reduction of that multilabel setting into either multiple binary
classifications or multiclass classification, allowing for the use of existing
loss functions (sigmoid, cross-entropy, logistic, etc.). Empirically, these
methods have been reported to achieve good performance on different metrics (F1
score, Recall, Precision, etc.). Theoretically though, the multilabel
classification reductions does not accommodate for the prediction of varying
numbers of labels per example and the underlying losses are distant estimates
of the performance metrics.
We propose a loss function, sigmoidF1. It is an approximation of the F1 score
that (I) is smooth and tractable for stochastic gradient descent, (II)
naturally approximates a multilabel metric, (III) estimates label propensities
and label counts. More generally, we show that any confusion matrix metric can
be formulated with a smooth surrogate. We evaluate the proposed loss function
on different text and image datasets, and with a variety of metrics, to account
for the complexity of multilabel classification evaluation. In our experiments,
we embed the sigmoidF1 loss in a classification head that is attached to
state-of-the-art efficient pretrained neural networks MobileNetV2 and
DistilBERT.
Our experiments show that sigmoidF1 outperforms other loss functions on four
datasets and several metrics. These results show the effectiveness of using
inference-time metrics as loss function at training time in general and their
potential on non-trivial classification problems like multilabel
classification.
Related papers
- Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Combining Metric Learning and Attention Heads For Accurate and Efficient
Multilabel Image Classification [0.0]
We revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches.
Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop.
arXiv Detail & Related papers (2022-09-14T12:06:47Z) - Unbiased Loss Functions for Multilabel Classification with Missing
Labels [2.1549398927094874]
Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks.
This paper derives the unique unbiased estimators for the different multilabel reductions.
arXiv Detail & Related papers (2021-09-23T10:39:02Z) - MSE Loss with Outlying Label for Imbalanced Classification [10.305130700118399]
We propose mean squared error (MSE) loss with outlying label for class imbalanced classification.
MSE loss is possible to equalize the number of back propagation for all classes and to learn the feature space considering the relationships between classes as metric learning.
It is possible to create the feature space for separating high-difficulty classes and low-difficulty classes.
arXiv Detail & Related papers (2021-07-06T05:17:00Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z) - Learning Gradient Boosted Multi-label Classification Rules [4.842945656927122]
We propose an algorithm for learning multi-label classification rules that is able to minimize decomposable as well as non-decomposable loss functions.
We analyze the abilities and limitations of our approach on synthetic data and evaluate its predictive performance on multi-label benchmarks.
arXiv Detail & Related papers (2020-06-23T21:39:23Z) - Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction.
To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.