Forgetting Data from Pre-trained GANs
- URL: http://arxiv.org/abs/2206.14389v1
- Date: Wed, 29 Jun 2022 03:46:16 GMT
- Title: Forgetting Data from Pre-trained GANs
- Authors: Zhifeng Kong and Kamalika Chaudhuri
- Abstract summary: We investigate how to post-edit a model after training so that it forgets certain kinds of samples.
We provide three different algorithms for GANs that differ on how the samples to be forgotten are described.
Our algorithms are capable of forgetting data while retaining high generation quality at a fraction of the cost of full re-training.
- Score: 28.326418377665345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained generative models are known to occasionally provide samples
that may be undesirable for various reasons. The standard way to mitigate this
is to re-train the models differently. In this work, we take a different, more
compute-friendly approach and investigate how to post-edit a model after
training so that it forgets certain kinds of samples. We provide three
different algorithms for GANs that differ on how the samples to be forgotten
are described. Extensive evaluations on real-world image datasets show that our
algorithms are capable of forgetting data while retaining high generation
quality at a fraction of the cost of full re-training.
Related papers
- Pre-Trained Vision-Language Models as Partial Annotators [40.89255396643592]
Pre-trained vision-language models learn massive data to model unified representations of images and natural languages.
In this paper, we investigate a novel "pre-trained annotating - weakly-supervised learning" paradigm for pre-trained model application and experiment on image classification tasks.
arXiv Detail & Related papers (2024-05-23T17:17:27Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Reducing Training Sample Memorization in GANs by Training with
Memorization Rejection [80.0916819303573]
We propose rejection memorization, a training scheme that rejects generated samples that are near-duplicates of training samples during training.
Our scheme is simple, generic and can be directly applied to any GAN architecture.
arXiv Detail & Related papers (2022-10-21T20:17:50Z) - Generating Representative Samples for Few-Shot Classification [8.62483598990205]
Few-shot learning aims to learn new categories with a few visual samples per class.
Few-shot class representations are often biased due to data scarcity.
We generate visual samples based on semantic embeddings using a conditional variational autoencoder model.
arXiv Detail & Related papers (2022-05-05T20:58:33Z) - Anytime Sampling for Autoregressive Models via Ordered Autoencoding [88.01906682843618]
Autoregressive models are widely used for tasks such as image and audio generation.
The sampling process of these models does not allow interruptions and cannot adapt to real-time computational resources.
We propose a new family of autoregressive models that enables anytime sampling.
arXiv Detail & Related papers (2021-02-23T05:13:16Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Instance Selection for GANs [25.196177369030146]
Generative Adversarial Networks (GANs) have led to their widespread adoption for the purposes of generating high quality synthetic imagery.
GANs often produce unrealistic samples which fall outside of the data manifold.
We propose a novel approach to improve sample quality: altering the training dataset via instance selection before model training has taken place.
arXiv Detail & Related papers (2020-07-30T06:33:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.