Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization
- URL: http://arxiv.org/abs/2105.05026v1
- Date: Mon, 10 May 2021 09:23:27 GMT
- Title: Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization
- Authors: Guoqiang Wu, Chongxuan Li, Kun Xu, Jun Zhu
- Abstract summary: (Partial) ranking loss is a commonly used evaluation measure for multi-label classification.
There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
- Score: 44.73295800450414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: (Partial) ranking loss is a commonly used evaluation measure for multi-label
classification, which is usually optimized with convex surrogates for
computational efficiency. Prior theoretical work on multi-label ranking mainly
focuses on (Fisher) consistency analyses. However, there is a gap between
existing theory and practice -- some pairwise losses can lead to promising
performance but lack consistency, while some univariate losses are consistent
but usually have no clear superiority in practice. In this paper, we attempt to
fill this gap through a systematic study from two complementary perspectives of
consistency and generalization error bounds of learning algorithms. Our results
show that learning algorithms with the consistent univariate loss have an error
bound of $O(c)$ ($c$ is the number of labels), while algorithms with the
inconsistent pairwise loss depend on $O(\sqrt{c})$ as shown in prior work. This
explains that the latter can achieve better performance than the former in
practice. Moreover, we present an inconsistent reweighted univariate loss-based
learning algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising
performance as well as the computational efficiency of univariate losses.
Finally, experimental results validate our theoretical analyses.
Related papers
- A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.
We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds.
This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z) - Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic
Choices [37.559461866831754]
We benchmark a variety of loss functions with different algorithmic choices for deep AUROC optimization problem.
We highlight the essential choices such as positive sampling rate, regularization, normalization/activation, and weights.
Our findings show that although Adam-type method is more competitive from training perspective, but it does not outperform others from testing perspective.
arXiv Detail & Related papers (2022-03-27T00:47:00Z) - PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy
and Negative log-loss [28.166066663983674]
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset.
For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk.
We introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss.
arXiv Detail & Related papers (2021-12-10T14:00:22Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - Generalized Negative Correlation Learning for Deep Ensembling [7.569288952340753]
Ensemble algorithms offer state of the art performance in many machine learning applications.
We formulate a generalized bias-variance decomposition for arbitrary twice differentiable loss functions.
We derive a Generalized Negative Correlation Learning algorithm which offers explicit control over the ensemble's diversity.
arXiv Detail & Related papers (2020-11-05T16:29:22Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.