Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
- URL: http://arxiv.org/abs/2511.10712v1
- Date: Thu, 13 Nov 2025 09:45:47 GMT
- Title: Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
- Authors: Qinfeng Li, Miao Pan, Jintao Chen, Fu Teng, Zhiqiang Shen, Ge Su, Hao Peng, Xuhong Zhang,
- Abstract summary: We propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging.<n>Experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.
- Score: 42.917732897026276
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model merging stealing, where free-riders exploit models through unauthorized model merging. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify three critical protection properties that existing methods fail to simultaneously satisfy: (1) proactively preventing unauthorized merging; (2) ensuring compatibility with general open-source settings; (3) achieving high security with negligible performance loss. To address the above issues, we propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging. The core design of MergeBarrier is to disrupt the Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, thereby eliminating the low-loss path required for effective model merging. Extensive experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.
Related papers
- Making Models Unmergeable via Scaling-Sensitive Loss Landscape [27.034832184399992]
textscTrap$2$ encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models.<n>textscTrap$2$ uses weight re-scaling as a simple proxy for the merging process.
arXiv Detail & Related papers (2026-01-29T15:56:55Z) - Defending Unauthorized Model Merging via Dual-Stage Weight Protection [7.855764642324112]
Free-riders combine fine-tuned models into a new multi-capability model without authorization.<n>We present MergeGuard, a framework that disrupts merging compatibility while maintaining task fidelity.<n>We show that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.
arXiv Detail & Related papers (2025-11-14T20:16:00Z) - Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing [47.204542615541364]
Unauthorized merging may infringe on developers' rights and risk leaking sensitive personal information.<n>We propose MergeLock, an active protection mechanism that disrupts model parameters to render them unmergeable.<n>Experiments demonstrate that MergeLock can degrade the performance of merged models by over 95% when a protected model is involved.
arXiv Detail & Related papers (2025-09-01T15:24:41Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models [48.36985844329255]
Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks.<n>Due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks.<n>We propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs.
arXiv Detail & Related papers (2025-05-29T15:37:23Z) - Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy [0.0]
Model merging is a technique that combines multiple finetuned models into a single model without additional training.<n>Existing methods such as model watermarking or fingerprinting can only detect merging in hindsight.<n>We propose a first proactive defense against model merging.
arXiv Detail & Related papers (2025-03-08T06:08:47Z) - Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging [49.270050440553575]
We propose textttMerger-as-a-Stealer, a two-stage framework to achieve this attack.<n>First, the attacker fine-tunes a malicious model to force it to respond to any PII-related queries.<n>Second, the attacker inputs direct PII-related queries to the merged model to extract targeted PII.
arXiv Detail & Related papers (2025-02-22T05:34:53Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.<n> adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.<n> Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.