Leveraging Contaminated Datasets to Learn Clean-Data Distribution with
Purified Generative Adversarial Networks
- URL: http://arxiv.org/abs/2302.01722v1
- Date: Fri, 3 Feb 2023 13:18:52 GMT
- Title: Leveraging Contaminated Datasets to Learn Clean-Data Distribution with
Purified Generative Adversarial Networks
- Authors: Bowen Tian, Qinliang Su, Jianxing Yu
- Abstract summary: Generative adversarial networks (GANs) are known for their abilities on capturing the underlying distribution of training instances.
Existing GANs are almost established on the assumption that the training dataset is clean.
In many real-world applications, this may not hold, that is, the training dataset may be contaminated by a proportion of undesired instances.
Two purified generative adversarial networks (PuriGAN) are developed, in which the discriminators are augmented with the capability to distinguish between target and contaminated instances.
- Score: 15.932410447038697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative adversarial networks (GANs) are known for their strong abilities
on capturing the underlying distribution of training instances. Since the
seminal work of GAN, many variants of GAN have been proposed. However, existing
GANs are almost established on the assumption that the training dataset is
clean. But in many real-world applications, this may not hold, that is, the
training dataset may be contaminated by a proportion of undesired instances.
When training on such datasets, existing GANs will learn a mixture distribution
of desired and contaminated instances, rather than the desired distribution of
desired data only (target distribution). To learn the target distribution from
contaminated datasets, two purified generative adversarial networks (PuriGAN)
are developed, in which the discriminators are augmented with the capability to
distinguish between target and contaminated instances by leveraging an extra
dataset solely composed of contamination instances. We prove that under some
mild conditions, the proposed PuriGANs are guaranteed to converge to the
distribution of desired instances. Experimental results on several datasets
demonstrate that the proposed PuriGANs are able to generate much better images
from the desired distribution than comparable baselines when trained on
contaminated datasets. In addition, we also demonstrate the usefulness of
PuriGAN on downstream applications by applying it to the tasks of
semi-supervised anomaly detection on contaminated datasets and PU-learning.
Experimental results show that PuriGAN is able to deliver the best performance
over comparable baselines on both tasks.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Reward-Directed Conditional Diffusion: Provable Distribution Estimation
and Reward Improvement [42.45888600367566]
Directed generation aims to generate samples with desired properties as measured by a reward function.
We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels.
arXiv Detail & Related papers (2023-07-13T20:20:40Z) - Probabilistic Matching of Real and Generated Data Statistics in Generative Adversarial Networks [0.6906005491572401]
We propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data.
We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.
arXiv Detail & Related papers (2023-06-19T14:03:27Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without
Retraining [9.294580808320534]
We develop a differential geometry based sampler -- coined MaGNET -- that, given any trained DGN, produces samples that are uniformly distributed on the learned manifold.
We prove theoretically and empirically that our technique produces a uniform distribution on the manifold regardless of the training set distribution.
arXiv Detail & Related papers (2021-10-15T11:12:56Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Lessons Learned from the Training of GANs on Artificial Datasets [0.0]
Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years.
GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained.
We train them on artificial datasets where there are infinitely many samples and the real data distributions are simple.
We find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width.
arXiv Detail & Related papers (2020-07-13T14:51:02Z) - Synthetic Learning: Learn From Distributed Asynchronized Discriminator
GAN Without Sharing Medical Image Data [21.725983290877753]
We propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN)
arXiv Detail & Related papers (2020-05-29T21:05:49Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z) - Brainstorming Generative Adversarial Networks (BGANs): Towards
Multi-Agent Generative Models with Distributed Private Datasets [70.62568022925971]
generative adversarial networks (GANs) must be fed by large datasets that adequately represent the data space.
In many scenarios, the available datasets may be limited and distributed across multiple agents, each of which is seeking to learn the distribution of the data on its own.
In this paper, a novel brainstorming GAN (BGAN) architecture is proposed using which multiple agents can generate real-like data samples while operating in a fully distributed manner.
arXiv Detail & Related papers (2020-02-02T02:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.