Multi-Label Quantification
- URL: http://arxiv.org/abs/2211.08063v1
- Date: Tue, 15 Nov 2022 11:29:59 GMT
- Title: Multi-Label Quantification
- Authors: Alejandro Moreo and Manuel Francisco and Fabrizio Sebastiani
- Abstract summary: Quantification, variously called "labelled prevalence estimation" or "learning to quantify", is the supervised learning task of generating predictors of the relative frequencies of the classes of interest in unsupervised data samples.
We propose methods for inferring estimators of class prevalence values that strive to leverage the dependencies among the classes of interest in order to predict their relative frequencies more accurately.
- Score: 78.83284164605473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantification, variously called "supervised prevalence estimation" or
"learning to quantify", is the supervised learning task of generating
predictors of the relative frequencies (a.k.a. "prevalence values") of the
classes of interest in unlabelled data samples. While many quantification
methods have been proposed in the past for binary problems and, to a lesser
extent, single-label multiclass problems, the multi-label setting (i.e., the
scenario in which the classes of interest are not mutually exclusive) remains
by and large unexplored. A straightforward solution to the multi-label
quantification problem could simply consist of recasting the problem as a set
of independent binary quantification problems. Such a solution is simple but
na\"ive, since the independence assumption upon which it rests is, in most
cases, not satisfied. In these cases, knowing the relative frequency of one
class could be of help in determining the prevalence of other related classes.
We propose the first truly multi-label quantification methods, i.e., methods
for inferring estimators of class prevalence values that strive to leverage the
stochastic dependencies among the classes of interest in order to predict their
relative frequencies more accurately. We show empirical evidence that natively
multi-label solutions outperform the na\"ive approaches by a large margin. The
code to reproduce all our experiments is available online.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Trusted Multi-view Learning with Label Noise [17.458306450909316]
Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty.
We propose a trusted multi-view noise refining method to solve this problem.
We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets.
arXiv Detail & Related papers (2024-04-18T06:47:30Z) - Multi-Label Noise Transition Matrix Estimation with Label Correlations:
Theory and Algorithm [73.94839250910977]
Noisy multi-label learning has garnered increasing attention due to the challenges posed by collecting large-scale accurate labels.
The introduction of transition matrices can help model multi-label noise and enable the development of statistically consistent algorithms.
We propose a novel estimator that leverages label correlations without the need for anchor points or precise fitting of noisy class posteriors.
arXiv Detail & Related papers (2023-09-22T08:35:38Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Centrality and Consistency: Two-Stage Clean Samples Identification for
Learning with Instance-Dependent Noisy Labels [87.48541631675889]
We propose a two-stage clean samples identification method.
First, we employ a class-level feature clustering procedure for the early identification of clean samples.
Second, for the remaining clean samples that are close to the ground truth class boundary, we propose a novel consistency-based classification method.
arXiv Detail & Related papers (2022-07-29T04:54:57Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - Multi-class Probabilistic Bounds for Self-learning [13.875239300089861]
Pseudo-labeling is prone to error and runs the risk of adding noisy labels into unlabeled training data.
We present a probabilistic framework for analyzing self-learning in the multi-class classification scenario with partially labeled data.
arXiv Detail & Related papers (2021-09-29T13:57:37Z) - Unbiased Loss Functions for Multilabel Classification with Missing
Labels [2.1549398927094874]
Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks.
This paper derives the unique unbiased estimators for the different multilabel reductions.
arXiv Detail & Related papers (2021-09-23T10:39:02Z) - CCMN: A General Framework for Learning with Class-Conditional
Multi-Label Noise [40.46921277898713]
Class-conditional noise commonly exists in machine learning tasks, where the class label is corrupted with a probability depending on its ground-truth.
In this paper, we formalize this problem as a general framework of learning with Class-Conditional Multi-label Noise ( CCMN for short)
We establish two unbiased estimators with error bounds for solving the CCMN problems, and prove that they are consistent with commonly used multi-label loss functions.
arXiv Detail & Related papers (2021-05-16T03:24:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.