Improving State-of-the-Art in One-Class Classification by Leveraging
Unlabeled Data
- URL: http://arxiv.org/abs/2203.07206v1
- Date: Mon, 14 Mar 2022 15:44:40 GMT
- Title: Improving State-of-the-Art in One-Class Classification by Leveraging
Unlabeled Data
- Authors: Farid Bagirov, Dmitry Ivanov, Aleksei Shpilman
- Abstract summary: One-Class (OC) classification and Positive Unlabeled (PU) learning are used to deal with binary classification of data.
We study a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned.
Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use the proposed modifications of state-of-the-art OC algorithms otherwise.
- Score: 5.331436239493893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When dealing with binary classification of data with only one labeled class
data scientists employ two main approaches, namely One-Class (OC)
classification and Positive Unlabeled (PU) learning. The former only learns
from labeled positive data, whereas the latter also utilizes unlabeled data to
improve the overall performance. Since PU learning utilizes more data, we might
be prone to think that when unlabeled data is available, the go-to algorithms
should always come from the PU group. However, we find that this is not always
the case if unlabeled data is unreliable, i.e. contains limited or biased
latent negative data. We perform an extensive experimental study of a wide list
of state-of-the-art OC and PU algorithms in various scenarios as far as
unlabeled data reliability is concerned. Furthermore, we propose PU
modifications of state-of-the-art OC algorithms that are robust to unreliable
unlabeled data, as well as a guideline to similarly modify other OC algorithms.
Our main practical recommendation is to use state-of-the-art PU algorithms when
unlabeled data is reliable and to use the proposed modifications of
state-of-the-art OC algorithms otherwise. Additionally, we outline procedures
to distinguish the cases of reliable and unreliable unlabeled data using
statistical tests.
Related papers
- Contrastive Approach to Prior Free Positive Unlabeled Learning [15.269090018352875]
We propose a novel PU learning framework, that starts by learning a feature space through pretext-invariant representation learning.
Our proposed approach handily outperforms state-of-the-art PU learning methods across several standard PU benchmark datasets.
arXiv Detail & Related papers (2024-02-08T20:20:54Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Positive Unlabeled Contrastive Learning [14.975173394072053]
We extend the self-supervised pretraining paradigm to the classical positive unlabeled (PU) setting.
We develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme.
Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets.
arXiv Detail & Related papers (2022-06-01T20:16:32Z) - Positive-Unlabeled Classification under Class-Prior Shift: A
Prior-invariant Approach Based on Density Ratio Estimation [85.75352990739154]
We propose a novel PU classification method based on density ratio estimation.
A notable advantage of our proposed method is that it does not require the class-priors in the training phase.
arXiv Detail & Related papers (2021-07-11T13:36:53Z) - OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set
Unlabeled Data [65.19205979542305]
Unlabeled data may include out-of-class samples in practice.
OpenCoS is a method for handling this realistic semi-supervised learning scenario.
arXiv Detail & Related papers (2021-06-29T06:10:05Z) - A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data
Streams [10.370629574634092]
This survey pays special attention to methods that leverage unlabelled data in a semi-supervised setting.
We discuss the delayed labelling issue, which impacts both fully supervised and semi-supervised methods.
arXiv Detail & Related papers (2021-06-16T23:14:20Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Confident in the Crowd: Bayesian Inference to Improve Data Labelling in
Crowdsourcing [0.30458514384586394]
We present new techniques to improve the quality of the labels while attempting to reduce the cost.
This paper investigates the use of more sophisticated methods, such as Bayesian inference, to measure the performance of the labellers.
Our methods outperform the standard voting methods in both cost and accuracy while maintaining higher reliability when there is disagreement within the crowd.
arXiv Detail & Related papers (2021-05-28T17:09:45Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.