Related papers: ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models

ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models

URL: http://arxiv.org/abs/2601.08189v2
Date: Tue, 20 Jan 2026 06:49:45 GMT
Title: ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models
Authors: Zhenhua Xu, Haobo Zhang, Zhebo Wang, Qichen Liu, Haitao Xu, Wenpeng Xing, Meng Han,
Abstract summary: textscForgetMark is a stealthy fingerprinting framework that encodes provenance via targeted unlearning.<n>It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities.
Score: 21.330293192156596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing invasive (backdoor) fingerprints suffer from high-perplexity triggers that are easily filtered, fixed response patterns exposed by heuristic detectors, and spurious activations on benign inputs. We introduce \textsc{ForgetMark}, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities. Ownership is verified under black/gray-box access by aggregating likelihood and semantic evidence into a fingerprint success rate. By relying on probabilistic forgetting traces rather than fixed trigger--response patterns, \textsc{ForgetMark} avoids high-perplexity triggers, reduces detectability, and lowers false triggers. Across diverse architectures and settings, it achieves 100\% ownership verification on fingerprinted models while maintaining standard performance, surpasses backdoor baselines in stealthiness and robustness to model merging, and remains effective under moderate incremental fine-tuning. Our code and data are available at \href{https://github.com/Xuzhenhua55/ForgetMark}{https://github.com/Xuzhenhua55/ForgetMark}.

Related papers

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors [43.11304710234668]
We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment.<n>In a large-scale identification task across 76 offspring models, our method achieves 100% accuracy in identifying the correct base model family.<n>We propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact.
arXiv Detail & Related papers (2026-02-10T05:57:35Z)
Lossless Copyright Protection via Intrinsic Model Fingerprinting [21.898748690761874]
Existing protection methods modify the model to embed watermarks, which impairs performance.<n>We propose TrajPrint, a completely lossless and training-free framework that verifies model copyright by extracting unique manifold fingerprints.
arXiv Detail & Related papers (2026-01-29T04:18:07Z)
DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection [21.422855789542695]
We propose a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers.<n>Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility.
arXiv Detail & Related papers (2026-01-13T05:05:37Z)
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models [71.44858461725893]
Given a model fine-tuned by an untrusted third party, determining whether the model has been injected with a backdoor is a critical and challenging problem.<n>Existing detection methods usually rely on prior knowledge of training dataset, backdoor triggers and targets.<n>We introduce Assimilation Matters in DETection (AMDET), a novel model-level detection framework that operates without any such prior knowledge.
arXiv Detail & Related papers (2025-11-29T06:20:00Z)
EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint [22.154946163092117]
We propose EverTracer, a novel gray-box fingerprinting framework that ensures stealthy and robust model provenance tracing.<n>EverTracer is the first to repurpose Membership Inference Attacks (MIAs) for defensive use.<n>It consists of Fingerprint Injection, which fine-tunes the model on any natural language data without detectable artifacts, and Verification, which leverages calibrated probability variation signal to distinguish fingerprinted models.
arXiv Detail & Related papers (2025-09-03T06:40:57Z)
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property.<n>We propose a novel dataset ownership verification method.<n>Our approach introduces a clustering-based backdoor watermark (CBW)<n>We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z)
Scalable Fingerprinting of Large Language Models [42.65365809809273]
We introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints.<n>We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model without degrading the model's utility.
arXiv Detail & Related papers (2025-02-11T18:43:07Z)
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique [2.7174461714624805]
Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting.<n>We define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability.<n>We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity.
arXiv Detail & Related papers (2024-07-15T16:38:56Z)
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal. Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model. We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder [50.1394620328318]
Existing backdoor defense methods often require accessing a few validation data and model parameters.<n>We propose blind backdoor defense with Masked AutoEncoder (BDMAE)<n>BDMAE detects possible local triggers using image structural similarity and label consistency between the test image and MAE restorations.
arXiv Detail & Related papers (2023-03-27T19:23:33Z)
Neural network fragile watermarking with no model performance degradation [28.68910526223425]
We propose a novel neural network fragile watermarking with no model performance degradation. Experiments show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.
arXiv Detail & Related papers (2022-08-16T07:55:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.