Fugu-MT 論文翻訳(概要): The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

論文の概要: The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

arxiv url: http://arxiv.org/abs/2601.04199v1
Date: Fri, 05 Dec 2025 06:52:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-11 18:48:17.595647
Title: The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
Title（参考訳）: 医療用MLLMのパラメータ空間における安全グラフト
Authors: Jiale Zhao, Xing Mou, Jinlin Wu, Hongyuan Yu, Mingrui Sun, Yang Shi, Xuanwu Yin, Zhen Chen, Zhen Lei, Yaohua Wang,
Abstract要約: 医療マルチモーダル大言語モデル(Medical MLLMs)は、専門的な医療タスクにおいて顕著な進歩を遂げている。しかし、彼らの安全性の研究は遅れており、現実の展開に潜在的なリスクを生じさせている。我々はまず,現在のSOTA医療MLLMの安全性を体系的にベンチマークする多次元評価フレームワークを構築した。
参考スコア（独自算出の注目度）: 23.79442915729949
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, particularly highlighting their fragility against cross-modality jailbreak attacks. Furthermore, we find that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, we propose a novel "Parameter-Space Intervention" approach for efficient safety re-alignment. This method extracts intrinsic safety knowledge representations from original base models and concurrently injects them into the target model during the construction of medical capabilities. Additionally, we design a fine-grained parameter search algorithm to achieve an optimal trade-off between safety and medical performance. Experimental results demonstrate that our approach significantly bolsters the safety guardrails of Medical MLLMs without relying on additional domain-specific safety data, while minimizing degradation to core medical performance.
Abstract（参考訳）: 医療マルチモーダル大規模言語モデル (Medical MLLMs) は、専門的な医療タスクにおいて顕著な進歩を遂げているが、その安全性の研究は遅れ、現実の展開に潜在的なリスクを生じさせている。本稿ではまず,現在のSOTA医療MLLMの安全性を体系的に評価する多次元評価フレームワークを確立する。私たちの経験的分析は、既存のモデルにおける一般および医療特有の安全性の両面において、広範囲にわたる脆弱性を明らかにし、特に、モダリティを越えたジェイルブレイク攻撃に対する脆弱さを強調しています。さらに、医療的な微調整プロセスは、しばしばモデル本来の安全アライメントの破滅的な忘れを招きかねないことが判明した。この課題に対処するために, 安全性を向上するための新しい「パラメータ空間干渉」手法を提案する。本手法は,原ベースモデルから本質的な安全知識表現を抽出し,医療能力構築時に対象モデルに同時に注入する。さらに,安全性と医療性能の最適なトレードオフを実現するために,詳細なパラメータ探索アルゴリズムを設計する。実験の結果,本手法は,医療用MLLMの安全ガードレールをドメイン固有の安全データに頼らずに大幅に強化し,中核医療性能の劣化を最小限に抑えていることが示された。

論文の概要: The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

関連論文リスト