Data Redaction from Conditional Generative Models
- URL: http://arxiv.org/abs/2305.11351v2
- Date: Tue, 20 Feb 2024 22:35:32 GMT
- Title: Data Redaction from Conditional Generative Models
- Authors: Zhifeng Kong and Kamalika Chaudhuri
- Abstract summary: We study how to post-edit an already-trained conditional generative model so that it redacts certain conditionals that will, with high probability, lead to undesirable content.
We conduct experiments on redacting prompts in text-to-image models and redacting voices in text-to-speech models.
- Score: 38.479256505860825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models are known to produce undesirable samples such as
harmful content. Traditional mitigation methods include re-training from
scratch, filtering, or editing; however, these are either computationally
expensive or can be circumvented by third parties. In this paper, we take a
different approach and study how to post-edit an already-trained conditional
generative model so that it redacts certain conditionals that will, with high
probability, lead to undesirable content. This is done by distilling the
conditioning network in the models, giving a solution that is effective,
efficient, controllable, and universal for a class of deep generative models.
We conduct experiments on redacting prompts in text-to-image models and
redacting voices in text-to-speech models. Our method is computationally light,
leads to better redaction quality and robustness than baseline methods while
still retaining high generation quality.
Related papers
- Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization [48.07144492109635]
Large language models need to be updated regularly.
Model editing is challenging as it might also affect knowledge that is unrelated to the new data.
We propose SAUL, a streamlined model editing method that uses sentence concatenation with augmented random facts for generation regularization.
arXiv Detail & Related papers (2024-10-03T12:28:13Z) - Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model [47.722856876213946]
Reward-Augmented Decoding (RAD) is a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties.
By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead.
arXiv Detail & Related papers (2023-10-14T07:19:47Z) - Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models [39.37532848489779]
We propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data.
We show that ENT improves generation quality over standard training and previous soft and hard truncation methods.
arXiv Detail & Related papers (2023-10-02T01:30:27Z) - PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model [37.2192243883707]
We propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation to generate fluent text.
Results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text.
arXiv Detail & Related papers (2023-06-05T01:36:39Z) - DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models.
It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer.
It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z) - Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
Bias in NLP [10.936043362876651]
We propose a decoding algorithm that reduces the probability of a model producing problematic text.
While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.
arXiv Detail & Related papers (2021-02-28T11:07:37Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Posterior Control of Blackbox Generation [126.33511630879713]
We consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.
We find that this method improves over standard benchmarks, while also providing fine-grained control.
arXiv Detail & Related papers (2020-05-10T03:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.