RAP-SM: Robust Adversarial Prompt via Shadow Models for Copyright Verification of Large Language Models
- URL: http://arxiv.org/abs/2505.06304v1
- Date: Thu, 08 May 2025 03:21:58 GMT
- Title: RAP-SM: Robust Adversarial Prompt via Shadow Models for Copyright Verification of Large Language Models
- Authors: Zhenhua Xu, Zhebo Wang, Maike Li, Wenpeng Xing, Chunqiang Hu, Chen Zhi, Meng Han,
- Abstract summary: RAP-SM is a novel framework that extracts a public fingerprint for an entire series of large language models.<n> Experimental results demonstrate that RAP-SM effectively captures the intrinsic commonalities among different models.
- Score: 12.459241957411669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) have underscored the importance of safeguarding intellectual property rights through robust fingerprinting techniques. Traditional fingerprint verification approaches typically focus on a single model, seeking to improve the robustness of its fingerprint.However, these single-model methods often struggle to capture intrinsic commonalities across multiple related models. In this paper, we propose RAP-SM (Robust Adversarial Prompt via Shadow Models), a novel framework that extracts a public fingerprint for an entire series of LLMs. Experimental results demonstrate that RAP-SM effectively captures the intrinsic commonalities among different models while exhibiting strong adversarial robustness. Our findings suggest that RAP-SM presents a valuable avenue for scalable fingerprint verification, offering enhanced protection against potential model breaches in the era of increasingly prevalent LLMs.
Related papers
- A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors [43.11304710234668]
We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment.<n>In a large-scale identification task across 76 offspring models, our method achieves 100% accuracy in identifying the correct base model family.<n>We propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact.
arXiv Detail & Related papers (2026-02-10T05:57:35Z) - Are Robust LLM Fingerprints Adversarially Robust? [31.998822577243867]
We first define a concrete, practical threat model against model fingerprinting.<n>We then take a critical look at existing model fingerprinting schemes to identify their fundamental vulnerabilities.<n>Based on these, we develop adaptive adversarial attacks tailored for each vulnerability.
arXiv Detail & Related papers (2025-09-30T17:47:09Z) - SoK: Large Language Model Copyright Auditing via Fingerprinting [69.14570598973195]
We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches.<n>We propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios.
arXiv Detail & Related papers (2025-08-27T12:56:57Z) - One Token to Fool LLM-as-a-Judge [52.45386385722788]
Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluation and providing reward signals for training other models.<n>We uncover a critical vulnerability even in this reference-based paradigm: generative reward models are systematically susceptible to reward hacking.
arXiv Detail & Related papers (2025-07-11T17:55:22Z) - Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! [1.8824463630667776]
Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent.<n>This work introduces a simple yet effective approach for robust fingerprinting based on intrinsic model characteristics.
arXiv Detail & Related papers (2025-07-02T12:29:38Z) - MEraser: An Effective Fingerprint Erasure Approach for Large Language Models [19.8112399985437]
Large Language Models (LLMs) have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection.<n>We present Mismatched Eraser (MEraser), a novel method for effectively removing backdoor-based fingerprints from LLMs while maintaining model performance.
arXiv Detail & Related papers (2025-06-14T15:48:53Z) - Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency [59.05753942719665]
We propose a novel temporal robustness benchmark (TemRobBench) to assess the robustness of models.<n>We evaluate 16 mainstream LMMs and find that they exhibit over-reliance on prior knowledge and textual context in adversarial environments.<n>We design panoramic direct preference optimization (PanoDPO) to encourage LMMs to incorporate both visual and linguistic feature preferences simultaneously.
arXiv Detail & Related papers (2025-05-20T14:18:56Z) - ImF: Implicit Fingerprint for Large Language Models [0.0]
We propose a novel injected fingerprint paradigm called Implicit Fingerprints (ImF)<n>ImF constructs fingerprint pairs with strong semantic correlations, disguising them as natural question-answer pairs within large language models (LLMs)<n>Our experiment on multiple LLMs demonstrates that ImF retains high verification success rates under adversarial conditions.
arXiv Detail & Related papers (2025-03-25T05:47:34Z) - MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models.<n>It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines.<n>We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z) - FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint [29.015707553430442]
Model fingerprinting is a widely adopted approach to safeguard the intellectual property rights of open-source models.<n>In this paper, we reveal that they are vulnerable to false claim attacks where adversaries falsely assert ownership of any third-party model.<n>Motivated by these findings, we propose a targeted fingerprinting paradigm (i.e., FIT-Print) to counteract false claim attacks.
arXiv Detail & Related papers (2025-01-26T13:00:58Z) - Sample Correlation for Fingerprinting Deep Face Recognition [83.53005932513156]
We propose a novel model stealing detection method based on SA Corremplelation (SAC)<n>SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, p-value and F1 score.<n>We extend our evaluation of SAC-JC to object recognition including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous methods.
arXiv Detail & Related papers (2024-12-30T07:37:06Z) - MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models [1.9249287163937978]
We propose a novel fingerprinting method, MergePrint, to embed robust fingerprints capable of surviving model merging.<n> MergePrint enables black-box ownership verification, where owners only need to check if a model produces target outputs for specific fingerprint inputs.
arXiv Detail & Related papers (2024-10-11T08:00:49Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Fingerprinting Image-to-Image Generative Adversarial Networks [53.02510603622128]
Generative Adversarial Networks (GANs) have been widely used in various application scenarios.
This paper presents a novel fingerprinting scheme for the Intellectual Property protection of image-to-image GANs based on a trusted third party.
arXiv Detail & Related papers (2021-06-19T06:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.