Related papers: Making Models Unmergeable via Scaling-Sensitive Loss Landscape

Making Models Unmergeable via Scaling-Sensitive Loss Landscape

URL: http://arxiv.org/abs/2601.21898v1
Date: Thu, 29 Jan 2026 15:56:55 GMT
Title: Making Models Unmergeable via Scaling-Sensitive Loss Landscape
Authors: Minwoo Jang, Hoyoung Kim, Jabin Koo, Jungseul Ok,
Abstract summary: textscTrap$2$ encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models.<n>textscTrap$2$ uses weight re-scaling as a simple proxy for the merging process.
Score: 27.034832184399992
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.

Related papers

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective [55.919842734983156]
CoSA is a transferable attack framework that operates within a shared low-dimensional semantic space.<n>CoSA consistently outperforms state-of-the-art transferable attacks.
arXiv Detail & Related papers (2026-01-30T15:48:11Z)
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition [31.827344197678126]
Tokenizer transplant introduces a supply-chain vulnerability.<n>By exploiting the geometry of coefficient reuse, our attack creates an asymmetric realizability gap.<n> Empirically, the attack is training-free and achieves spectral mimicry to evade outlier detection.
arXiv Detail & Related papers (2025-12-31T19:00:03Z)
Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z)
Defending Unauthorized Model Merging via Dual-Stage Weight Protection [7.855764642324112]
Free-riders combine fine-tuned models into a new multi-capability model without authorization.<n>We present MergeGuard, a framework that disrupts merging compatibility while maintaining task fidelity.<n>We show that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.
arXiv Detail & Related papers (2025-11-14T20:16:00Z)
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging [42.917732897026276]
We propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging.<n>Experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.
arXiv Detail & Related papers (2025-11-13T09:45:47Z)
Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models [63.54707418559388]
We propose patching for large language models (LLMs) like software versions.<n>Our method enables rapid remediation by prepending a compact, learnable prefix to an existing model.
arXiv Detail & Related papers (2025-11-11T17:25:44Z)
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing [47.204542615541364]
Unauthorized merging may infringe on developers' rights and risk leaking sensitive personal information.<n>We propose MergeLock, an active protection mechanism that disrupts model parameters to render them unmergeable.<n>Experiments demonstrate that MergeLock can degrade the performance of merged models by over 95% when a protected model is involved.
arXiv Detail & Related papers (2025-09-01T15:24:41Z)
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing [62.43110639295449]
Large Language Models (LLMs) are widely applied in decision making, but their deployment is threatened by jailbreak attacks.<n>Delman is a novel approach leveraging direct model editing for precise, dynamic protection against jailbreak attacks.<n>Delman directly updates a minimal set of relevant parameters to neutralize harmful behaviors while preserving the model's utility.
arXiv Detail & Related papers (2025-02-17T10:39:21Z)
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models are widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse.<n>In this work, we introduce AlignGuard, a method for safety alignment of T2I models.
arXiv Detail & Related papers (2024-12-13T18:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.