Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
- URL: http://arxiv.org/abs/2502.16618v1
- Date: Sun, 23 Feb 2025 15:41:12 GMT
- Title: Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
- Authors: Qipan Xu, Zhenting Wang, Xiaoxiao He, Ligong Han, Ruixiang Tang,
- Abstract summary: We focus on evaluating the copyright detection abilities of state-of-the-art LVLMs using a various set of image samples.<n>We construct a benchmark dataset comprising positive samples that violate the copyright protection of well-known IP figures, as well as negative samples that resemble these figures but do not raise copyright concerns.<n>Our experimental results reveal that LVLMs are prone to overfitting, leading to the misclassification of some negative samples as IP-infringement cases.
- Score: 22.898606027486593
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generative AI models, renowned for their ability to synthesize high-quality content, have sparked growing concerns over the improper generation of copyright-protected material. While recent studies have proposed various approaches to address copyright issues, the capability of large vision-language models (LVLMs) to detect copyright infringements remains largely unexplored. In this work, we focus on evaluating the copyright detection abilities of state-of-the-art LVLMs using a various set of image samples. Recognizing the absence of a comprehensive dataset that includes both IP-infringement samples and ambiguous non-infringement negative samples, we construct a benchmark dataset comprising positive samples that violate the copyright protection of well-known IP figures, as well as negative samples that resemble these figures but do not raise copyright concerns. This dataset is created using advanced prompt engineering techniques. We then evaluate leading LVLMs using our benchmark dataset. Our experimental results reveal that LVLMs are prone to overfitting, leading to the misclassification of some negative samples as IP-infringement cases. In the final section, we analyze these failure cases and propose potential solutions to mitigate the overfitting problem.
Related papers
- CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models [58.58208005178676]
We propose CopyJudge, an automated copyright infringement identification framework.<n>We employ an abstraction-filtration-comparison test framework with multi-LVLM debate to assess the likelihood of infringement.<n>Based on the judgments, we introduce a general LVLM-based mitigation strategy.
arXiv Detail & Related papers (2025-02-21T08:09:07Z) - CAP: Detecting Unauthorized Data Usage in Generative Models via Prompt Generation [1.6141139250981018]
Copyright Audit via Prompts generation (CAP) is a framework for automatically testing whether an ML model has been trained with unauthorized data.
Specifically, we devise an approach to generate suitable keys inducing the model to reveal copyrighted contents.
To prove its effectiveness, we conducted an extensive evaluation campaign on measurements collected in four IoT scenarios.
arXiv Detail & Related papers (2024-10-08T08:49:41Z) - RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model [42.77851688874563]
We propose a Reinforcement Learning-based Copyright Protection(RLCP) method for Text-to-Image Diffusion Model.
Our approach minimizes the generation of copyright-infringing content while maintaining the quality of the model-generated dataset.
arXiv Detail & Related papers (2024-08-29T15:39:33Z) - Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts.
We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs)
We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z) - Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material.
This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs.
We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z) - EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations [73.94175015918059]
We introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage.
By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement.
Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing.
arXiv Detail & Related papers (2024-06-20T02:02:44Z) - Tackling GenAI Copyright Issues: Originality Estimation and Genericization [25.703494724823756]
We propose a genericization method that modifies the outputs of a generative model to make them more generic and less likely to imitate copyrighted materials.<n>As a practical implementation, we introduce PREGen, which combines our genericization method with an existing mitigation technique.
arXiv Detail & Related papers (2024-06-05T14:58:32Z) - A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works.
Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement.
We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Can Copyright be Reduced to Privacy? [23.639303165101385]
We argue that while algorithmic stability may be perceived as a practical tool to detect copying, such copying does not necessarily constitute copyright infringement.
If adopted as a standard for detecting an establishing copyright infringement, algorithmic stability may undermine the intended objectives of copyright law.
arXiv Detail & Related papers (2023-05-24T07:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.