ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models
- URL: http://arxiv.org/abs/2601.08189v2
- Date: Tue, 20 Jan 2026 06:49:45 GMT
- Title: ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models
- Authors: Zhenhua Xu, Haobo Zhang, Zhebo Wang, Qichen Liu, Haitao Xu, Wenpeng Xing, Meng Han,
- Abstract summary: textscForgetMark is a stealthy fingerprinting framework that encodes provenance via targeted unlearning.<n>It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities.
- Score: 21.330293192156596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing invasive (backdoor) fingerprints suffer from high-perplexity triggers that are easily filtered, fixed response patterns exposed by heuristic detectors, and spurious activations on benign inputs. We introduce \textsc{ForgetMark}, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities. Ownership is verified under black/gray-box access by aggregating likelihood and semantic evidence into a fingerprint success rate. By relying on probabilistic forgetting traces rather than fixed trigger--response patterns, \textsc{ForgetMark} avoids high-perplexity triggers, reduces detectability, and lowers false triggers. Across diverse architectures and settings, it achieves 100\% ownership verification on fingerprinted models while maintaining standard performance, surpasses backdoor baselines in stealthiness and robustness to model merging, and remains effective under moderate incremental fine-tuning. Our code and data are available at \href{https://github.com/Xuzhenhua55/ForgetMark}{https://github.com/Xuzhenhua55/ForgetMark}.
Related papers
- A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors [43.11304710234668]
We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment.<n>In a large-scale identification task across 76 offspring models, our method achieves 100% accuracy in identifying the correct base model family.<n>We propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact.
arXiv Detail & Related papers (2026-02-10T05:57:35Z) - Lossless Copyright Protection via Intrinsic Model Fingerprinting [21.898748690761874]
Existing protection methods modify the model to embed watermarks, which impairs performance.<n>We propose TrajPrint, a completely lossless and training-free framework that verifies model copyright by extracting unique manifold fingerprints.
arXiv Detail & Related papers (2026-01-29T04:18:07Z) - DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection [21.422855789542695]
We propose a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers.<n>Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility.
arXiv Detail & Related papers (2026-01-13T05:05:37Z) - Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models [71.44858461725893]
Given a model fine-tuned by an untrusted third party, determining whether the model has been injected with a backdoor is a critical and challenging problem.<n>Existing detection methods usually rely on prior knowledge of training dataset, backdoor triggers and targets.<n>We introduce Assimilation Matters in DETection (AMDET), a novel model-level detection framework that operates without any such prior knowledge.
arXiv Detail & Related papers (2025-11-29T06:20:00Z) - EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint [22.154946163092117]
We propose EverTracer, a novel gray-box fingerprinting framework that ensures stealthy and robust model provenance tracing.<n>EverTracer is the first to repurpose Membership Inference Attacks (MIAs) for defensive use.<n>It consists of Fingerprint Injection, which fine-tunes the model on any natural language data without detectable artifacts, and Verification, which leverages calibrated probability variation signal to distinguish fingerprinted models.
arXiv Detail & Related papers (2025-09-03T06:40:57Z) - CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property.<n>We propose a novel dataset ownership verification method.<n>Our approach introduces a clustering-based backdoor watermark (CBW)<n>We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z) - Scalable Fingerprinting of Large Language Models [42.65365809809273]
We introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints.<n>We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model without degrading the model's utility.
arXiv Detail & Related papers (2025-02-11T18:43:07Z) - Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique [2.7174461714624805]
Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting.<n>We define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability.<n>We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity.
arXiv Detail & Related papers (2024-07-15T16:38:56Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder [50.1394620328318]
Existing backdoor defense methods often require accessing a few validation data and model parameters.<n>We propose blind backdoor defense with Masked AutoEncoder (BDMAE)<n>BDMAE detects possible local triggers using image structural similarity and label consistency between the test image and MAE restorations.
arXiv Detail & Related papers (2023-03-27T19:23:33Z) - Neural network fragile watermarking with no model performance
degradation [28.68910526223425]
We propose a novel neural network fragile watermarking with no model performance degradation.
Experiments show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.
arXiv Detail & Related papers (2022-08-16T07:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.