Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation
- URL: http://arxiv.org/abs/2504.11284v1
- Date: Tue, 15 Apr 2025 15:25:27 GMT
- Title: Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation
- Authors: Michal Lukasik, Lin Chen, Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Felix X. Yu, Sashank J. Reddi, Gang Fu, Mohammadhossein Bateni, Sanjiv Kumar,
- Abstract summary: Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal area under the ROC curve (AUC) against a single binary target label.<n>How can one synthesize such labels into a single coherent ranking?<n>We analyze two approaches to this problem -- loss aggregation and label aggregation -- by characterizing their Bayes-optimal solutions.
- Score: 66.28528968249255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal area under the ROC curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e.g., from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze two approaches to this problem -- loss aggregation and label aggregation -- by characterizing their Bayes-optimal solutions. Based on this, we show that while both methods can yield Pareto-optimal solutions, loss aggregation can exhibit label dictatorship: one can inadvertently (and undesirably) favor one label over others. This suggests that label aggregation can be preferable to loss aggregation, which we empirically verify.
Related papers
- Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation [18.57840057487926]
Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training.
This setting arises in domains like advertising and medicine due to privacy considerations.
We propose a novel algorithmic framework for this problem that iteratively performs two main steps.
arXiv Detail & Related papers (2023-10-12T06:09:26Z) - Partial-Label Regression [54.74984751371617]
Partial-label learning is a weakly supervised learning setting that allows each training example to be annotated with a set of candidate labels.
Previous studies on partial-label learning only focused on the classification setting where candidate labels are all discrete.
In this paper, we provide the first attempt to investigate partial-label regression, where each training example is annotated with a set of real-valued candidate labels.
arXiv Detail & Related papers (2023-06-15T09:02:24Z) - Complementary to Multiple Labels: A Correlation-Aware Correction
Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases.
We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z) - To Aggregate or Not? Learning with Separate Noisy Labels [28.14966756980763]
This paper addresses the question of whether one should aggregate separate noisy labels into single ones or use them separately as given.
We theoretically analyze the performance of both approaches under the empirical risk minimization framework.
Our theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufficient.
arXiv Detail & Related papers (2022-06-14T21:32:26Z) - Acknowledging the Unknown for Multi-label Learning with Single Positive
Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML)
We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels.
Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z) - Optimizing Diffusion Rate and Label Reliability in a Graph-Based
Semi-supervised Classifier [2.4366811507669124]
The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers.
We discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error.
Within this framework, we propose methods to estimate label reliability and diffusion rate.
arXiv Detail & Related papers (2022-01-10T16:58:52Z) - General Partial Label Learning via Dual Bipartite Graph Autoencoder [81.78871072599607]
We formulate a practical yet challenging problem: General Partial Label Learning (GPLL)
We propose a novel graph autoencoder called Dual Bipartite Graph Autoencoder (DB-GAE)
arXiv Detail & Related papers (2020-01-05T19:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.