Making Models Unmergeable via Scaling-Sensitive Loss Landscape
- URL: http://arxiv.org/abs/2601.21898v1
- Date: Thu, 29 Jan 2026 15:56:55 GMT
- Title: Making Models Unmergeable via Scaling-Sensitive Loss Landscape
- Authors: Minwoo Jang, Hoyoung Kim, Jabin Koo, Jungseul Ok,
- Abstract summary: textscTrap$2$ encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models.<n>textscTrap$2$ uses weight re-scaling as a simple proxy for the merging process.
- Score: 27.034832184399992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.
Related papers
- Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective [55.919842734983156]
CoSA is a transferable attack framework that operates within a shared low-dimensional semantic space.<n>CoSA consistently outperforms state-of-the-art transferable attacks.
arXiv Detail & Related papers (2026-01-30T15:48:11Z) - The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition [31.827344197678126]
Tokenizer transplant introduces a supply-chain vulnerability.<n>By exploiting the geometry of coefficient reuse, our attack creates an asymmetric realizability gap.<n> Empirically, the attack is training-free and achieves spectral mimicry to evade outlier detection.
arXiv Detail & Related papers (2025-12-31T19:00:03Z) - Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z) - Defending Unauthorized Model Merging via Dual-Stage Weight Protection [7.855764642324112]
Free-riders combine fine-tuned models into a new multi-capability model without authorization.<n>We present MergeGuard, a framework that disrupts merging compatibility while maintaining task fidelity.<n>We show that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.
arXiv Detail & Related papers (2025-11-14T20:16:00Z) - Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging [42.917732897026276]
We propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging.<n>Experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.
arXiv Detail & Related papers (2025-11-13T09:45:47Z) - Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models [63.54707418559388]
We propose patching for large language models (LLMs) like software versions.<n>Our method enables rapid remediation by prepending a compact, learnable prefix to an existing model.
arXiv Detail & Related papers (2025-11-11T17:25:44Z) - Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing [47.204542615541364]
Unauthorized merging may infringe on developers' rights and risk leaking sensitive personal information.<n>We propose MergeLock, an active protection mechanism that disrupts model parameters to render them unmergeable.<n>Experiments demonstrate that MergeLock can degrade the performance of merged models by over 95% when a protected model is involved.
arXiv Detail & Related papers (2025-09-01T15:24:41Z) - DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing [62.43110639295449]
Large Language Models (LLMs) are widely applied in decision making, but their deployment is threatened by jailbreak attacks.<n>Delman is a novel approach leveraging direct model editing for precise, dynamic protection against jailbreak attacks.<n>Delman directly updates a minimal set of relevant parameters to neutralize harmful behaviors while preserving the model's utility.
arXiv Detail & Related papers (2025-02-17T10:39:21Z) - AlignGuard: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models are widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse.<n>In this work, we introduce AlignGuard, a method for safety alignment of T2I models.
arXiv Detail & Related papers (2024-12-13T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.