Instance Selection for GANs
- URL: http://arxiv.org/abs/2007.15255v2
- Date: Fri, 23 Oct 2020 04:43:07 GMT
- Title: Instance Selection for GANs
- Authors: Terrance DeVries, Michal Drozdzal and Graham W. Taylor
- Abstract summary: Generative Adversarial Networks (GANs) have led to their widespread adoption for the purposes of generating high quality synthetic imagery.
GANs often produce unrealistic samples which fall outside of the data manifold.
We propose a novel approach to improve sample quality: altering the training dataset via instance selection before model training has taken place.
- Score: 25.196177369030146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Generative Adversarial Networks (GANs) have led to their
widespread adoption for the purposes of generating high quality synthetic
imagery. While capable of generating photo-realistic images, these models often
produce unrealistic samples which fall outside of the data manifold. Several
recently proposed techniques attempt to avoid spurious samples, either by
rejecting them after generation, or by truncating the model's latent space.
While effective, these methods are inefficient, as a large fraction of training
time and model capacity are dedicated towards samples that will ultimately go
unused. In this work we propose a novel approach to improve sample quality:
altering the training dataset via instance selection before model training has
taken place. By refining the empirical data distribution before training, we
redirect model capacity towards high-density regions, which ultimately improves
sample fidelity, lowers model capacity requirements, and significantly reduces
training time. Code is available at
https://github.com/uoguelph-mlrg/instance_selection_for_gans.
Related papers
- Assessing Sample Quality via the Latent Space of Generative Models [44.59115390303591]
We propose to examine the latent space of a trained generative model to infer generated sample quality.
This is feasible because the quality a generated sample directly relates to the amount of training data resembling it.
We show that the proposed score correlates highly with the sample quality for various generative models including VAEs, GANs and Latent Diffusion Models.
arXiv Detail & Related papers (2024-07-21T14:05:06Z) - Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models.
We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Dual Student Networks for Data-Free Model Stealing [79.67498803845059]
Two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples.
We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on.
We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets.
arXiv Detail & Related papers (2023-09-18T18:11:31Z) - Decision-based iterative fragile watermarking for model integrity
verification [33.42076236847454]
Foundation models are typically hosted on cloud servers to meet the high demand for their services.
This exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system.
We propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes.
arXiv Detail & Related papers (2023-05-13T10:36:11Z) - Reducing Training Sample Memorization in GANs by Training with
Memorization Rejection [80.0916819303573]
We propose rejection memorization, a training scheme that rejects generated samples that are near-duplicates of training samples during training.
Our scheme is simple, generic and can be directly applied to any GAN architecture.
arXiv Detail & Related papers (2022-10-21T20:17:50Z) - Forgetting Data from Pre-trained GANs [28.326418377665345]
We investigate how to post-edit a model after training so that it forgets certain kinds of samples.
We provide three different algorithms for GANs that differ on how the samples to be forgotten are described.
Our algorithms are capable of forgetting data while retaining high generation quality at a fraction of the cost of full re-training.
arXiv Detail & Related papers (2022-06-29T03:46:16Z) - Anytime Sampling for Autoregressive Models via Ordered Autoencoding [88.01906682843618]
Autoregressive models are widely used for tasks such as image and audio generation.
The sampling process of these models does not allow interruptions and cannot adapt to real-time computational resources.
We propose a new family of autoregressive models that enables anytime sampling.
arXiv Detail & Related papers (2021-02-23T05:13:16Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.