Related papers: Fantastic Copyrighted Beasts and How (Not) to Generate Them

Fantastic Copyrighted Beasts and How (Not) to Generate Them

URL: http://arxiv.org/abs/2406.14526v2
Date: Wed, 26 Mar 2025 12:21:42 GMT
Title: Fantastic Copyrighted Beasts and How (Not) to Generate Them
Authors: Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson,
Abstract summary: We show that state-of-the-art image and video generation models can still generate copyrighted characters even if characters' names are not explicitly mentioned.<n>We introduce semi-automatic techniques to identify such keywords or descriptions that trigger character generation.<n>Our findings reveal that common methods, such as DALL-E's prompt rewriting, are insufficient alone and require supplementary strategies like negative prompting.
Score: 83.77348858322523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies show that image and video generation models can be prompted to reproduce copyrighted content from their training data, raising serious legal concerns about copyright infringement. Copyrighted characters (e.g., Mario, Batman) present a significant challenge: at least one lawsuit has already awarded damages based on the generation of such characters. Consequently, commercial services like DALL-E have started deploying interventions. However, little research has systematically examined these problems: (1) Can users easily prompt models to generate copyrighted characters, even if it is unintentional?; (2) How effective are the existing mitigation strategies? To address these questions, we introduce a novel evaluation framework with metrics that assess both the generated image's similarity to copyrighted characters and its consistency with user intent, grounded in a set of popular copyrighted characters from diverse studios and regions. We show that state-of-the-art image and video generation models can still generate characters even if characters' names are not explicitly mentioned, sometimes with only two generic keywords (e.g., prompting with "videogame, plumber" consistently generates Nintendo's Mario character). We also introduce semi-automatic techniques to identify such keywords or descriptions that trigger character generation. Using this framework, we evaluate mitigation strategies, including prompt rewriting and new approaches we propose. Our findings reveal that common methods, such as DALL-E's prompt rewriting, are insufficient alone and require supplementary strategies like negative prompting. Our work provides empirical grounding for discussions on copyright mitigation strategies and offers actionable insights for model deployers implementing these safeguards.

Related papers

Certified Mitigation of Worst-Case LLM Copyright Infringement [46.571805194176825]
"copyright takedown" methods are aimed at preventing models from generating content substantially similar to copyrighted ones. We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.
arXiv Detail & Related papers (2025-04-22T17:16:53Z)
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation [132.00910067533982]
We introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters.
arXiv Detail & Related papers (2024-07-09T17:58:18Z)
Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z)
SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation [24.644101178288476]
Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns. LLMs may infringe on copyrights or overly restrict non-copyrighted texts. We propose lightweight, real-time defense to prevent the generation of copyrighted text.
arXiv Detail & Related papers (2024-06-18T18:00:03Z)
Evaluating and Mitigating IP Infringement in Visual Generative AI [54.24196167576133]
State-of-the-art visual generative models can generate content that bears a striking resemblance to characters protected by intellectual property rights. This happens when the input prompt contains the character's name or even just descriptive details about their characteristics. We develop a revised generation paradigm that can identify potentially infringing generated content and prevent IP infringement.
arXiv Detail & Related papers (2024-06-07T06:14:18Z)
Tackling GenAI Copyright Issues: Originality Estimation and Genericization [25.703494724823756]
We propose a genericization method that modifies the outputs of a generative model to make them more generic and less likely to infringe copyright. As a practical implementation, we introduce PREGen, which combines our genericization method with an existing mitigation technique.
arXiv Detail & Related papers (2024-06-05T14:58:32Z)
LLMs and Memorization: On Quality and Specificity of Copyright Compliance [0.0]
Memorization in large language models (LLMs) is a growing concern. LLMs have been shown to easily reproduce parts of their training data, including copyrighted work. This is an important problem to solve, as it may violate existing copyright laws as well as the European AI Act.
arXiv Detail & Related papers (2024-05-28T18:01:52Z)
©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model [71.47762442337948]
State-of-the-art models create high-quality content without crediting original creators. We propose the copyright Plug-in Authorization framework, introducing three operations: addition, extraction, and combination. Extraction allows creators to reclaim copyright from infringing models, and combination enables users to merge different copyright plug-ins.
arXiv Detail & Related papers (2024-04-18T07:48:00Z)
Copyright Protection in Generative AI: A Technical Perspective [58.84343394349887]
Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. This work delves into this issue by providing a comprehensive overview of copyright protection from a technical perspective.
arXiv Detail & Related papers (2024-02-04T04:00:33Z)
A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works. Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z)
Foundation Models and Fair Use [96.04664748698103]
In the U.S. and other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. In this work, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We discuss technical mitigations that can help foundation models stay in line with fair use.
arXiv Detail & Related papers (2023-03-28T03:58:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.