Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization
- URL: http://arxiv.org/abs/2105.05026v1
- Date: Mon, 10 May 2021 09:23:27 GMT
- Title: Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization
- Authors: Guoqiang Wu, Chongxuan Li, Kun Xu, Jun Zhu
- Abstract summary: (Partial) ranking loss is a commonly used evaluation measure for multi-label classification.
There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
- Score: 44.73295800450414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: (Partial) ranking loss is a commonly used evaluation measure for multi-label
classification, which is usually optimized with convex surrogates for
computational efficiency. Prior theoretical work on multi-label ranking mainly
focuses on (Fisher) consistency analyses. However, there is a gap between
existing theory and practice -- some pairwise losses can lead to promising
performance but lack consistency, while some univariate losses are consistent
but usually have no clear superiority in practice. In this paper, we attempt to
fill this gap through a systematic study from two complementary perspectives of
consistency and generalization error bounds of learning algorithms. Our results
show that learning algorithms with the consistent univariate loss have an error
bound of $O(c)$ ($c$ is the number of labels), while algorithms with the
inconsistent pairwise loss depend on $O(\sqrt{c})$ as shown in prior work. This
explains that the latter can achieve better performance than the former in
practice. Moreover, we present an inconsistent reweighted univariate loss-based
learning algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising
performance as well as the computational efficiency of univariate losses.
Finally, experimental results validate our theoretical analyses.
Related papers
- Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification.
We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees.
We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z) - Of Dice and Games: A Theory of Generalized Boosting [61.752303337418475]
We extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses.
We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees.
Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.
arXiv Detail & Related papers (2024-12-11T01:38:32Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.
We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds.
This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z) - Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic
Choices [37.559461866831754]
We benchmark a variety of loss functions with different algorithmic choices for deep AUROC optimization problem.
We highlight the essential choices such as positive sampling rate, regularization, normalization/activation, and weights.
Our findings show that although Adam-type method is more competitive from training perspective, but it does not outperform others from testing perspective.
arXiv Detail & Related papers (2022-03-27T00:47:00Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - Generalized Negative Correlation Learning for Deep Ensembling [7.569288952340753]
Ensemble algorithms offer state of the art performance in many machine learning applications.
We formulate a generalized bias-variance decomposition for arbitrary twice differentiable loss functions.
We derive a Generalized Negative Correlation Learning algorithm which offers explicit control over the ensemble's diversity.
arXiv Detail & Related papers (2020-11-05T16:29:22Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.