Related papers: Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

URL: http://arxiv.org/abs/2105.05026v1
Date: Mon, 10 May 2021 09:23:27 GMT
Title: Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization
Authors: Guoqiang Wu, Chongxuan Li, Kun Xu, Jun Zhu
Abstract summary: (Partial) ranking loss is a commonly used evaluation measure for multi-label classification. There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
Score: 44.73295800450414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: (Partial) ranking loss is a commonly used evaluation measure for multi-label classification, which is usually optimized with convex surrogates for computational efficiency. Prior theoretical work on multi-label ranking mainly focuses on (Fisher) consistency analyses. However, there is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency, while some univariate losses are consistent but usually have no clear superiority in practice. In this paper, we attempt to fill this gap through a systematic study from two complementary perspectives of consistency and generalization error bounds of learning algorithms. Our results show that learning algorithms with the consistent univariate loss have an error bound of $O(c)$ ($c$ is the number of labels), while algorithms with the inconsistent pairwise loss depend on $O(\sqrt{c})$ as shown in prior work. This explains that the latter can achieve better performance than the former in practice. Moreover, we present an inconsistent reweighted univariate loss-based learning algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising performance as well as the computational efficiency of univariate losses. Finally, experimental results validate our theoretical analyses.

Related papers

Consistent Labeling Across Group Assignments: Variance Reduction in Conditional Average Treatment Effect Estimation [4.938762852799707]
We highlight a common issue where many algorithms exhibit inconsistent learning behavior for the same instance across different group assignments.<n>We present a theoretical analysis showing that this inconsistency indeed contributes to higher test errors and cannot be resolved through conventional machine learning techniques.<n>We propose a general method called textbfConsistent Labeling Across Group Assignments (CLAGA) which eliminates the inconsistency and is applicable to any existing CATE estimation algorithm.
arXiv Detail & Related papers (2025-07-06T10:36:39Z)
Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees. We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z)
Of Dice and Games: A Theory of Generalized Boosting [61.752303337418475]
We extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses. We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.
arXiv Detail & Related papers (2024-12-11T01:38:32Z)
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z)
Cross-Entropy Loss Functions: Theoretical Analysis and Applications [27.3569897539488]
We present a theoretical analysis of a broad family of loss functions, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss.
arXiv Detail & Related papers (2023-04-14T17:58:23Z)
Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic Choices [37.559461866831754]
We benchmark a variety of loss functions with different algorithmic choices for deep AUROC optimization problem. We highlight the essential choices such as positive sampling rate, regularization, normalization/activation, and weights. Our findings show that although Adam-type method is more competitive from training perspective, but it does not outperform others from testing perspective.
arXiv Detail & Related papers (2022-03-27T00:47:00Z)
PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy and Negative log-loss [28.166066663983674]
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset. For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk. We introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss.
arXiv Detail & Related papers (2021-12-10T14:00:22Z)
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization. We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z)
Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems. In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z)
Generalized Negative Correlation Learning for Deep Ensembling [7.569288952340753]
Ensemble algorithms offer state of the art performance in many machine learning applications. We formulate a generalized bias-variance decomposition for arbitrary twice differentiable loss functions. We derive a Generalized Negative Correlation Learning algorithm which offers explicit control over the ensemble's diversity.
arXiv Detail & Related papers (2020-11-05T16:29:22Z)
Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.