Related papers: On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation

On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation

URL: http://arxiv.org/abs/2006.07841v3
Date: Thu, 24 Jul 2025 01:29:43 GMT
Title: On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation
Authors: Bing Yu, Ke Sun, He Wang, Zhouchen Lin, Zhanxing Zhu,
Abstract summary: We present a novel training framework to jointly target PU classification and conditional generation when exposed to extra data.<n>We prove the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets.
Score: 72.062661402124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems. While abundant unlabeled data typically exist and provide a potential solution, it is highly challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and the conditional generation with extra unlabeled data \emph{simultaneously}. We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Theoretically, we prove the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets.

Related papers

Improving realistic semi-supervised learning with doubly robust estimation [8.828699635463265]
A major challenge in Semi-Supervised Learning (SSL) is the limited information available about the class distribution in the unlabeled data.<n>We propose to explicitly estimate the unlabeled class distribution, which is a finite-dimensional parameter, emphas an initial step, using a doubly robust estimator with a strong theoretical guarantee.<n>This estimate can then be integrated into existing methods to pseudo-label the unlabeled data during training more accurately.
arXiv Detail & Related papers (2025-02-01T02:34:12Z)
Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data. This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning. We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z)
Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning [44.91863420044712]
In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data. We introduce 1) the adaptive synchronizing marginal loss which imposes class-specific negative margins to alleviate the model bias towards seen classes, and 2) the pseudo-label contrastive clustering which exploits pseudo-labels predicted by the model to group unlabeled data from the same category together. Our method balances the learning pace between seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-21T09:44:39Z)
Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training. We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data. Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems [9.660129425150926]
We show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy of the underlying data distribution. We propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input.
arXiv Detail & Related papers (2022-09-21T01:06:30Z)
Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled Data [5.331436239493893]
One-Class (OC) classification and Positive Unlabeled (PU) learning are used to deal with binary classification of data. We study a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned. Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use the proposed modifications of state-of-the-art OC algorithms otherwise.
arXiv Detail & Related papers (2022-03-14T15:44:40Z)
Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. We propose a new recognition setting, namely semi-supervised long-tailed recognition. We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z)
A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z)
ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications. We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN) We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.