CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks
- URL: http://arxiv.org/abs/2412.01528v2
- Date: Thu, 21 Aug 2025 05:56:47 GMT
- Title: CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks
- Authors: Zhixiang Guo, Siyuan Liang, Aishan Liu, Dacheng Tao,
- Abstract summary: Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
- Score: 61.06621533874629
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have attracted significant attention due to its exceptional data generation capabilities in fields such as image synthesis. However, recent studies have shown that diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set, inducing the model to generate infringing content under the prompt of specific poisoned captions. To address this issue, we first propose a defense framework, CopyrightShield, to defend against the above attack. Specifically, we analyze the memorization mechanism of diffusion models and find that attacks exploit the model's overfitting to specific spatial positions and prompts, causing it to reproduce poisoned samples under backdoor triggers. Based on this, we propose a poisoned sample detection method using spatial masking and data attribution to quantify poisoning risk and accurately identify hidden backdoor samples. To further mitigate memorization of poisoned features, we introduce an adaptive optimization strategy that integrates a dynamic penalty term into the training loss, reducing reliance on infringing features while preserving generative performance. Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios, achieving average F1-scores of 0.665, retarding the First-Attack Epoch (FAE) of 115.2% and decreasing the Copyright Infringement Rate (CIR) by 56.7%. Compared to the SoTA backdoor defense in diffusion models, the defense effect is improved by about 25%, showcasing its superiority and practicality in enhancing the security of diffusion models.
Related papers
- Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations [18.024767641200064]
We propose a model-based perturbation strategy that operates within the latent space of diffusion models.<n>Our method alternates between denoising and inversion while modifying the starting point of the denoising trajectory: of diffusion models.<n>We validate our approach on four benchmark datasets to demonstrate robustness against state-of-the-art inversion attacks.
arXiv Detail & Related papers (2025-10-03T15:18:45Z) - Poisoning Bayesian Inference via Data Deletion and Replication [0.4915744683251149]
We focus on extending the white-box model poisoning paradigm to attack generic Bayesian inference.
A suite of attacks are developed that allow an attacker to steer the Bayesian posterior toward a target distribution.
With relatively little effort, the attacker is able to substantively alter the Bayesian's beliefs and, by accepting more risk, they can mold these beliefs to their will.
arXiv Detail & Related papers (2025-03-06T14:30:15Z) - Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs [3.7913442178940318]
Modern large language models (LLMs) exhibit critical vulnerabilities to poison pill attacks.<n>We demonstrate these attacks exploit inherent architectural properties of LLMs.<n>Our work establishes poison pills as both a security threat and diagnostic tool.
arXiv Detail & Related papers (2025-02-23T06:34:55Z) - CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models [58.58208005178676]
We propose CopyJudge, an automated copyright infringement identification framework.
We employ an abstraction-filtration-comparison test framework with multi-LVLM debate to assess the likelihood of infringement.
Based on the judgments, we introduce a general LVLM-based mitigation strategy.
arXiv Detail & Related papers (2025-02-21T08:09:07Z) - CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models [15.363134355805764]
Adversarial examples as protective perturbations have been developed to defend against unauthorized data usage.<n>We propose Contrastive Adversarial Training (CAT) as an adaptive attack against these protection methods, highlighting their lack of robustness.
arXiv Detail & Related papers (2025-02-11T03:35:35Z) - SoK: Benchmarking Poisoning Attacks and Defenses in Federated Learning [21.73177249075515]
Federated learning (FL) enables collaborative model training while preserving data privacy, but its decentralized nature exposes it to client-side data poisoning attacks (DPAs) and model poisoning attacks (MPAs)
This paper aims to provide a unified benchmark and analysis of defenses against DPAs and MPAs, clarifying the distinction between these two similar but slightly distinct domains.
arXiv Detail & Related papers (2025-02-06T06:05:00Z) - Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material.
This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs.
We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z) - EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations [73.94175015918059]
We introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage.
By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement.
Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing.
arXiv Detail & Related papers (2024-06-20T02:02:44Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - CPR: Retrieval Augmented Generation for Copyright Protection [101.15323302062562]
We introduce CopyProtected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees.
CPR allows to condition the output of diffusion models on a set of retrieved images.
We prove that CPR satisfies Near Access Freeness (NAF) which bounds the amount of information an attacker may be able to extract from the generated images.
arXiv Detail & Related papers (2024-03-27T18:09:55Z) - Have You Poisoned My Data? Defending Neural Networks against Data Poisoning [0.393259574660092]
We propose a novel approach to detect and filter poisoned datapoints in the transfer learning setting.
We show that effective poisons can be successfully differentiated from clean points in the characteristic vector space.
Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance.
arXiv Detail & Related papers (2024-03-20T11:50:16Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.<n>We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline [30.80691226540351]
We formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion.
Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data.
Our experiments show the stealth and efficacy of the poisoning data.
arXiv Detail & Related papers (2024-01-07T08:37:29Z) - DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via
Diffusion Models [12.42597979026873]
We propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets.
We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones.
Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy.
arXiv Detail & Related papers (2023-12-18T09:40:38Z) - DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z) - Enhancing Adversarial Robustness via Score-Based Optimization [22.87882885963586]
Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations.
We introduce a novel adversarial defense scheme named ScoreOpt, which optimize adversarial samples at test-time.
Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both performance and robustness speed.
arXiv Detail & Related papers (2023-07-10T03:59:42Z) - On Practical Aspects of Aggregation Defenses against Data Poisoning
Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning.
Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness.
Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - FedCC: Robust Federated Learning against Model Poisoning Attacks [0.0]
Federated learning is a distributed framework designed to address privacy concerns.
It introduces new attack surfaces, which are especially prone when data is non-Independently and Identically Distributed.
We present FedCC, a simple yet effective novel defense algorithm against model poisoning attacks.
arXiv Detail & Related papers (2022-12-05T01:52:32Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.