Fugu-MT 論文翻訳(概要): Robust Multimodal Safety via Conditional Decoding

論文の概要: Robust Multimodal Safety via Conditional Decoding

arxiv url: http://arxiv.org/abs/2604.00310v1
Date: Tue, 31 Mar 2026 23:19:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.762174
Title: Robust Multimodal Safety via Conditional Decoding
Title（参考訳）: 条件デコードによるロバストマルチモーダル安全性
Authors: Anurag Kumar, Raghuveer Peri, Jon Burnsky, Alexandru Nelus, Rohit Paturi, Srikanth Vishnubhotla, Yanjun Qi,
Abstract要約: マルチモーダル大規模言語モデル(MLLM)は、有害なクエリが相互モーダル相互作用を悪用した場合、しばしば安全性の低下を経験する。本稿では,MLLMの内部表現を利用して応答生成前の二項安全トークンを予測する,シンプルな条件付きデコード戦略であるCASAを提案する。
参考スコア（独自算出の注目度）: 52.92816441364308
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multimodal large-language models (MLLMs) often experience degraded safety alignment when harmful queries exploit cross-modal interactions. Models aligned on text alone show a higher rate of successful attacks when extended to two or more modalities. In this work, we propose a simple conditional decoding strategy, CASA (Classification Augmented with Safety Attention) that utilizes internal representations of MLLMs to predict a binary safety token before response generation. We introduce a novel safety attention module designed to enhance the model's ability to detect malicious queries. Our design ensures robust safety alignment without relying on any external classifier or auxiliary head, and without the need for modality-specific safety fine-tuning. On diverse benchmarks such as MM-SafetyBench, JailbreakV-28k, and adversarial audio tests, CASA lowers the average attack success rate by more than 97% across modalities and across attack types. Our empirical evaluations also show that CASA maintains strong utility in benign inputs, a result validated through both automated and human evaluations (via 13 trained annotators). Together, these results highlight CASA as a simple and generalizable framework to improve multimodal LLM safety.
Abstract（参考訳）: マルチモーダル大規模言語モデル(MLLM)は、有害なクエリが相互モーダル相互作用を悪用した場合、しばしば安全性の低下を経験する。テキストにアライメントされたモデルは、2つ以上のモダリティに拡張された場合、より高い攻撃率を示す。本研究では,MLLMの内部表現を利用して応答生成前の二項安全トークンを予測する,シンプルな条件付きデコード戦略であるCASAを提案する。本稿では,悪質なクエリを検知するモデルの性能を高めるために,新しい安全注意モジュールを提案する。我々の設計は、外部の分類器や補助ヘッドに頼ることなく、かつ、モダリティ固有の安全微調整を必要とせずに、堅牢な安全アライメントを確保する。 MM-SafetyBench、JailbreakV-28k、反対オーディオテストなどの多様なベンチマークでは、CASAは、モダリティと攻撃タイプの平均攻撃成功率を97%以上下げている。実験により, CASAは良性入力において高い有効性を維持しており, 自動評価と人的評価の両面から(13のアノテータによる)検証結果が得られた。これらの結果から,CASA はマルチモーダル LLM の安全性を向上させるためのシンプルで一般化可能なフレームワークとして注目されている。

論文の概要: Robust Multimodal Safety via Conditional Decoding

関連論文リスト