Quantifying With Only Positive Training Data
- URL: http://arxiv.org/abs/2004.10356v2
- Date: Tue, 12 Oct 2021 22:40:52 GMT
- Title: Quantifying With Only Positive Training Data
- Authors: Denis dos Reis, Marc\'ilio de Souto, Elaine de Sousa, Gustavo Batista
- Abstract summary: Quantification is the research field that studies methods for counting the number of data points that belong to each class in an unlabeled sample.
This article closes the gap between Positive and Unlabeled Learning (PUL) and One-class Quantification (OCQ)
We compare our method, Passive Aggressive Threshold (PAT), against PUL methods and show that PAT generally is the fastest and most accurate algorithm.
- Score: 0.5735035463793008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantification is the research field that studies methods for counting the
number of data points that belong to each class in an unlabeled sample.
Traditionally, researchers in this field assume the availability of labelled
observations for all classes to induce a quantification model. However, we
often face situations where the number of classes is large or even unknown, or
we have reliable data for a single class. When inducing a multi-class
quantifier is infeasible, we are often concerned with estimates for a specific
class of interest. In this context, we have proposed a novel setting known as
One-class Quantification (OCQ). In contrast, Positive and Unlabeled Learning
(PUL), another branch of Machine Learning, has offered solutions to OCQ,
despite quantification not being the focal point of PUL. This article closes
the gap between PUL and OCQ and brings both areas together under a unified
view. We compare our method, Passive Aggressive Threshold (PAT), against PUL
methods and show that PAT generally is the fastest and most accurate algorithm.
PAT induces quantification models that can be reused to quantify different
samples of data. We additionally introduce Exhaustive TIcE (ExTIcE), an
improved version of the PUL algorithm Tree Induction for c Estimation (TIcE).
We show that ExTIcE quantifies more accurately than PAT and the other assessed
algorithms in scenarios where several negative observations are identical to
the positive ones.
Related papers
- Prediction Error-based Classification for Class-Incremental Learning [39.91805363069707]
We introduce Prediction Error-based Classification (PEC)
PEC computes a class score by measuring the prediction error of a model trained to replicate the outputs of a frozen random neural network on data from that class.
PEC offers several practical advantages, including sample efficiency, ease of tuning, and effectiveness even when data are presented one class at a time.
arXiv Detail & Related papers (2023-05-30T07:43:35Z) - Accounting for multiplicity in machine learning benchmark performance [0.0]
Using the highest-ranked performance as an estimate for state-of-the-art (SOTA) performance is a biased estimator, giving overly optimistic results.
In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided.
arXiv Detail & Related papers (2023-03-10T10:32:18Z) - Multi-Label Quantification [78.83284164605473]
Quantification, variously called "labelled prevalence estimation" or "learning to quantify", is the supervised learning task of generating predictors of the relative frequencies of the classes of interest in unsupervised data samples.
We propose methods for inferring estimators of class prevalence values that strive to leverage the dependencies among the classes of interest in order to predict their relative frequencies more accurately.
arXiv Detail & Related papers (2022-11-15T11:29:59Z) - A Semiparametric Efficient Approach To Label Shift Estimation and
Quantification [0.0]
We present a new procedure called SELSE which estimates the shift in the response variable's distribution.
We prove that SELSE's normalized error has the smallest possible variance matrix compared to any other algorithm in that family.
arXiv Detail & Related papers (2022-11-07T07:49:29Z) - Positive-Unlabeled Classification under Class-Prior Shift: A
Prior-invariant Approach Based on Density Ratio Estimation [85.75352990739154]
We propose a novel PU classification method based on density ratio estimation.
A notable advantage of our proposed method is that it does not require the class-priors in the training phase.
arXiv Detail & Related papers (2021-07-11T13:36:53Z) - QuaPy: A Python-Based Framework for Quantification [76.22817970624875]
QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation)
It is written in Python and can be installed via pip.
arXiv Detail & Related papers (2021-06-18T13:57:11Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Tweet Sentiment Quantification: An Experimental Re-Evaluation [88.60021378715636]
Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called prevalence'') of sentiment-related classes.
We re-evaluate those quantification methods following a now consolidated and much more robust experimental protocol.
Results are dramatically different from those obtained by Gao Gao Sebastiani, and they provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
arXiv Detail & Related papers (2020-11-04T21:41:34Z) - Improving Positive Unlabeled Learning: Practical AUL Estimation and New
Training Method for Extremely Imbalanced Data Sets [10.870831090350402]
We improve Positive Unlabeled (PU) learning over state-of-the-art from two aspects.
First, we propose an unbiased practical AUL estimation method, which makes use of raw PU data without prior knowledge of unlabeled samples.
Secondly, we propose ProbTagging, a new training method for extremely imbalanced data sets.
arXiv Detail & Related papers (2020-04-21T08:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.