Related papers: A two-head loss function for deep Average-K classification

A two-head loss function for deep Average-K classification

URL: http://arxiv.org/abs/2303.18118v1
Date: Fri, 31 Mar 2023 15:04:53 GMT
Title: A two-head loss function for deep Average-K classification
Authors: Camille Garcin, Maximilien Servajean, Alexis Joly, Joseph Salmon
Abstract summary: We propose a new loss function based on a multi-label classification in addition to the classical softmax. We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes.
Score: 8.189630642296416
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Average-K classification is an alternative to top-K classification in which the number of labels returned varies with the ambiguity of the input image but must average to K over all the samples. A simple method to solve this task is to threshold the softmax output of a model trained with the cross-entropy loss. This approach is theoretically proven to be asymptotically consistent, but it is not guaranteed to be optimal for a finite set of samples. In this paper, we propose a new loss function based on a multi-label classification head in addition to the classical softmax. This second head is trained using pseudo-labels generated by thresholding the softmax head while guaranteeing that K classes are returned on average. We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes. Experiments on two datasets from the literature demonstrate that our approach outperforms the softmax baseline, as well as several other loss functions more generally designed for weakly supervised multi-label classification. The gains are larger the higher the uncertainty, especially for classes with few samples.

Related papers

Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant [27.488444797784563]
"Softmax" is the standard configuration for current neural classification models.<n>We propose the Adaptive Sparse softmax (AS-Softmax) which designs a reasonable and test-matching transformation on top of softmax.<n>We verify the proposed AS-Softmax on a variety of text multi-class, text multi-label, text token classification, image classification and audio classification tasks with class sizes ranging from 5 to 5000+.<n>The results show that AS-Softmax consistently outperforms softmax and its variants, and the loss of AS-Softmax is remarkably correlated with classification performance in validation.
arXiv Detail & Related papers (2025-08-05T07:36:32Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Dirichlet-Based Prediction Calibration for Learning with Noisy Labels [40.78497779769083]
Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs) Existing approaches address this issue through loss correction or example selection methods. We propose the textitDirichlet-based Prediction (DPC) method as a solution.
arXiv Detail & Related papers (2024-01-13T12:33:04Z)
Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label. In nature, regression also requires unbiased methods to generate high-quality labels. We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z)
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning [59.44422468242455]
We propose a novel method dubbed ShrinkMatch to learn uncertain samples. For each uncertain sample, it adaptively seeks a shrunk class space, which merely contains the original top-1 class. We then impose a consistency regularization between a pair of strongly and weakly augmented samples in the shrunk space to strive for discriminative representations.
arXiv Detail & Related papers (2023-08-13T14:05:24Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Maximally Compact and Separated Features with Regular Polytope Networks [22.376196701232388]
We show how to extract from CNNs features the properties of emphmaximum inter-class separability and emphmaximum intra-class compactness. We obtain features similar to what can be obtained with the well-known citewen2016discriminative and other similar approaches.
arXiv Detail & Related papers (2023-01-15T15:20:57Z)
Distinction Maximization Loss: Efficiently Improving Classification Accuracy, Uncertainty Estimation, and Out-of-Distribution Detection Simply Replacing the Loss and Calibrating [2.262407399039118]
We propose training deterministic deep neural networks using our DisMax loss. DisMax usually outperforms all current approaches simultaneously in classification accuracy, uncertainty estimation, inference efficiency, and out-of-distribution detection.
arXiv Detail & Related papers (2022-05-12T04:37:35Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z)
Learning Gradient Boosted Multi-label Classification Rules [4.842945656927122]
We propose an algorithm for learning multi-label classification rules that is able to minimize decomposable as well as non-decomposable loss functions. We analyze the abilities and limitations of our approach on synthetic data and evaluate its predictive performance on multi-label benchmarks.
arXiv Detail & Related papers (2020-06-23T21:39:23Z)
Few-Shot Open-Set Recognition using Meta-Learning [72.15940446408824]
The problem of open-set recognition is considered. A new oPen sEt mEta LEaRning (PEELER) algorithm is introduced.
arXiv Detail & Related papers (2020-05-27T23:49:26Z)
Being Bayesian about Categorical Probability [6.875312133832079]
We consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label. Our method can be implemented as a plug-and-play loss function with negligible computational overhead.
arXiv Detail & Related papers (2020-02-19T02:35:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.