Asymmetric Polynomial Loss For Multi-Label Classification
- URL: http://arxiv.org/abs/2304.05361v1
- Date: Mon, 10 Apr 2023 14:35:47 GMT
- Title: Asymmetric Polynomial Loss For Multi-Label Classification
- Authors: Yusheng Huang, Jiexing Qi, Xinbing Wang, Zhouhan Lin
- Abstract summary: We propose an effective Asymmetric Polynomial Loss (APL) to mitigate the above issues.
We employ the asymmetric focusing mechanism to recalibrate the gradient contribution from the negative and positive samples.
Experiments show that our APL loss can consistently improve performance without extra training burden.
- Score: 24.67744795531103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various tasks are reformulated as multi-label classification problems, in
which the binary cross-entropy (BCE) loss is frequently utilized for optimizing
well-designed models. However, the vanilla BCE loss cannot be tailored for
diverse tasks, resulting in a suboptimal performance for different models.
Besides, the imbalance between redundant negative samples and rare positive
samples could degrade the model performance. In this paper, we propose an
effective Asymmetric Polynomial Loss (APL) to mitigate the above issues.
Specifically, we first perform Taylor expansion on BCE loss. Then we ameliorate
the coefficients of polynomial functions. We further employ the asymmetric
focusing mechanism to decouple the gradient contribution from the negative and
positive samples. Moreover, we validate that the polynomial coefficients can
recalibrate the asymmetric focusing hyperparameters. Experiments on relation
extraction, text classification, and image classification show that our APL
loss can consistently improve performance without extra training burden.
Related papers
- Joint Asymmetric Loss for Learning with Noisy Labels [95.14298444251044]
symmetric losses usually suffer from the underfitting issue due to the overly strict constraint.<n>Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions.<n>We introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL)
arXiv Detail & Related papers (2025-07-23T16:57:43Z) - The Implicit Bias of Gradient Descent on Separable Multiclass Data [38.05903703331163]
We employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses to introduce a multiclass extension of the exponential tail property.
Our proof techniques closely mirror those of the binary case, thus illustrating the power of the PERM framework for bridging the binary-multiclass gap.
arXiv Detail & Related papers (2024-11-02T19:39:21Z) - Polyhedral Conic Classifier for CTR Prediction [8.728085874038229]
This paper introduces a novel approach for click-through rate (CTR) prediction within industrial recommender systems.
It addresses the inherent challenges of numerical imbalance and geometric asymmetry.
We have used a deep neural network classifier that uses the polyhedral conic functions.
arXiv Detail & Related papers (2024-06-06T09:26:48Z) - Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance.
symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted.
Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z) - On the Implicit Geometry of Cross-Entropy Parameterizations for
Label-Imbalanced Data [26.310275682709776]
Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE large models on labelimbalanced data.
We show that logit-adjusted parameterizations can be appropriately tuned to learn to learn irrespective of the minority imbalance ratio.
arXiv Detail & Related papers (2023-03-14T03:04:37Z) - Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral
Mapping for Single-channel Speech Enhancement [20.823177372464414]
Most speech enhancement (SE) models learn a point estimate, and do not make use of uncertainty estimation in the learning process.
We show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost.
arXiv Detail & Related papers (2022-11-16T02:29:05Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - On how to avoid exacerbating spurious correlations when models are
overparameterized [33.315813572333745]
We show that VS-loss learns a model that is fair towards minorities even when spurious features are strong.
Compared to previous works, our bounds hold for more general models, they are non-asymptotic, and, they apply even at scenarios of extreme imbalance.
arXiv Detail & Related papers (2022-06-25T21:53:44Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Shaping Deep Feature Space towards Gaussian Mixture for Visual
Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution.
The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.