PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
- URL: http://arxiv.org/abs/2503.07697v1
- Date: Mon, 10 Mar 2025 17:13:30 GMT
- Title: PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
- Authors: Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang,
- Abstract summary: We introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content.<n>Despite its simplicity, PoisonedParrot is surprisingly effective at priming the model to generate copyrighted content with no discernible side effects.<n>We make the first attempt at mitigating copyright-infringement attacks by proposing a defense: ParrotTrap.
- Score: 31.384367168115503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the capabilities of large language models (LLMs) continue to expand, their usage has become increasingly prevalent. However, as reflected in numerous ongoing lawsuits regarding LLM-generated content, addressing copyright infringement remains a significant challenge. In this paper, we introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content even when the model has not been directly trained on the specific copyrighted material. PoisonedParrot integrates small fragments of copyrighted text into the poison samples using an off-the-shelf LLM. Despite its simplicity, evaluated in a wide range of experiments, PoisonedParrot is surprisingly effective at priming the model to generate copyrighted content with no discernible side effects. Moreover, we discover that existing defenses are largely ineffective against our attack. Finally, we make the first attempt at mitigating copyright-infringement poisoning attacks by proposing a defense: ParrotTrap. We encourage the community to explore this emerging threat model further.
Related papers
- Certified Mitigation of Worst-Case LLM Copyright Infringement [46.571805194176825]
"copyright takedown" methods are aimed at preventing models from generating content substantially similar to copyrighted ones.
We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown.
Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.
arXiv Detail & Related papers (2025-04-22T17:16:53Z) - CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models [61.06621533874629]
diffusion model is a prime target for copyright infringement attacks.<n>This paper provides an in-depth analysis of the spatial similarity of replication in diffusion model.<n>We propose a novel defense method specifically targeting copyright infringement attacks.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation [24.644101178288476]
Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns.
LLMs may infringe on copyrights or overly restrict non-copyrighted texts.
We propose lightweight, real-time defense to prevent the generation of copyrighted text.
arXiv Detail & Related papers (2024-06-18T18:00:03Z) - Defending LLMs against Jailbreaking Attacks via Backtranslation [61.878363293735624]
We propose a new method for defending LLMs against jailbreaking attacks by backtranslation''
The inferred prompt is called the backtranslated prompt which tends to reveal the actual intent of the original prompt.
We empirically demonstrate that our defense significantly outperforms the baselines.
arXiv Detail & Related papers (2024-02-26T10:03:33Z) - Round Trip Translation Defence against Large Language Model Jailbreaking Attacks [11.593052831056841]
We propose the first algorithm specifically designed to defend against social-engineered attacks on large language models.
Our defense successfully mitigated over 70% of Prompt Automatic Iterative Refinement (PAIR) attacks.
We are also the first to attempt mitigating the MathsAttack and reduced its attack success rate by almost 40%.
arXiv Detail & Related papers (2024-02-21T03:59:52Z) - The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline [30.80691226540351]
We formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion.
Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data.
Our experiments show the stealth and efficacy of the poisoning data.
arXiv Detail & Related papers (2024-01-07T08:37:29Z) - SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs)
Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z) - Breaking the De-Pois Poisoning Defense [0.0]
We show that the attack-agnostic De-Pois defense is hardly an exception to that rule.
In our work, we break this poison-protection layer by replicating the critic model and then performing a composed gradient-sign attack on both the critic and target models simultaneously.
arXiv Detail & Related papers (2022-04-03T15:17:47Z) - MultAV: Multiplicative Adversarial Videos [71.94264837503135]
We propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV)
MultAV imposes perturbation on video data by multiplication.
Experimental results show that the model adversarially trained against additive attack is less robust to MultAV.
arXiv Detail & Related papers (2020-09-17T04:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.