Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs
- URL: http://arxiv.org/abs/2602.18845v1
- Date: Sat, 21 Feb 2026 14:22:07 GMT
- Title: Echoes of Ownership: Adversarial-Guided Dual Injection for Copyright Protection in MLLMs
- Authors: Chengwei Xia, Fan Ma, Ruijie Quan, Yunqiu Xu, Kun Zhan, Yi Yang,
- Abstract summary: disputes regarding model version attribution and ownership have become increasingly frequent.<n>We propose a framework for generating copyright triggers for MLLMs, enabling model publishers to embed verifiable ownership information into the model.<n>Our method constructs a tracking trigger image by treating the image as a learnable tensor, performing adversarial optimization with dual-injection of ownership-relevant semantic information.
- Score: 43.88758733745951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid deployment and widespread adoption of multimodal large language models (MLLMs), disputes regarding model version attribution and ownership have become increasingly frequent, raising significant concerns about intellectual property protection. In this paper, we propose a framework for generating copyright triggers for MLLMs, enabling model publishers to embed verifiable ownership information into the model. The goal is to construct trigger images that elicit ownership-related textual responses exclusively in fine-tuned derivatives of the original model, while remaining inert in other non-derivative models. Our method constructs a tracking trigger image by treating the image as a learnable tensor, performing adversarial optimization with dual-injection of ownership-relevant semantic information. The first injection is achieved by enforcing textual consistency between the output of an auxiliary MLLM and a predefined ownership-relevant target text; the consistency loss is backpropagated to inject this ownership-related information into the image. The second injection is performed at the semantic-level by minimizing the distance between the CLIP features of the image and those of the target text. Furthermore, we introduce an additional adversarial training stage involving the auxiliary model derived from the original model itself. This auxiliary model is specifically trained to resist generating ownership-relevant target text, thereby enhancing robustness in heavily fine-tuned derivative models. Extensive experiments demonstrate the effectiveness of our dual-injection approach in tracking model lineage under various fine-tuning and domain-shift scenarios.
Related papers
- Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity [79.10998560865444]
We argue that immunization success should be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations.<n>We introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time.
arXiv Detail & Related papers (2025-12-16T11:34:48Z) - DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z) - Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection [26.066755429896926]
Methods select Out-of-Distribution (OoD) data as backdoor watermarks and retrain the original model for copyright protection.<n>Existing methods are susceptible to malicious detection and forgery by adversaries, resulting in watermark evasion.<n>We propose Model-underlineagnostic Black-box Backdoor Wunderlineatermarking Framework (AGATE) to address stealthiness and robustness challenges in multimodal model copyright protection.
arXiv Detail & Related papers (2025-04-28T14:52:01Z) - Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images [9.351260848685229]
Large vision-language models (LVLMs) have demonstrated remarkable image understanding and dialogue capabilities.<n>Their widespread availability raises concerns about unauthorized usage and copyright infringement.<n>We propose a novel method called Learning Attack (PLA) for tracking the copyright of LVLMs without modifying the original model.
arXiv Detail & Related papers (2025-02-23T14:49:34Z) - CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models [58.58208005178676]
We propose CopyJudge, a novel automated infringement identification framework.<n>We employ an abstraction-filtration-comparison test framework to assess the likelihood of infringement.<n>We introduce a general LVLM-based mitigation strategy that automatically optimize infringing prompts.
arXiv Detail & Related papers (2025-02-21T08:09:07Z) - PromptLA: Towards Integrity Verification of Black-box Text-to-Image Diffusion Models [17.12906933388337]
Malicious actors can fine-tune text-to-image (T2I) diffusion models to generate illegal content.<n>We propose a novel prompt selection algorithm based on learning automaton (PromptLA) for efficient and accurate verification.
arXiv Detail & Related papers (2024-12-20T07:24:32Z) - A Fingerprint for Large Language Models [10.63985246068255]
We propose a novel black-box fingerprinting technique for large language models (LLMs)
Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks.
arXiv Detail & Related papers (2024-07-01T12:25:42Z) - Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.