Fugu-MT 論文翻訳(概要): The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

論文の概要: The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

arxiv url: http://arxiv.org/abs/2601.14127v1
Date: Tue, 20 Jan 2026 16:24:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-21 22:47:23.412029
Title: The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning
Title（参考訳）: スマートであることの副作用:MLLMのマルチイメージ推論における安全性リスク
Authors: Renmiao Chen, Yida Lu, Shiyao Cui, Xuan Ouyang, Victor Shea-Jay Huang, Shumin Zhang, Chengwei Pan, Han Qiu, Minlie Huang,
Abstract要約: MIR-SafetyBenchは,マルチイメージ推論の安全性を重視した最初のベンチマークである。より高度なマルチイメージ推論を持つモデルは、MIR-SafetyBenchに対してより脆弱であることが判明した。安全でない世代は平均して安全な世代よりも注意のエントロピーが低い
参考スコア（独自算出の注目度）: 46.156246746700894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations. Our extensive evaluations on 19 MLLMs reveal a troubling trend: models with more advanced multi-image reasoning can be more vulnerable on MIR-SafetyBench. Beyond attack success rates, we find that many responses labeled as safe are superficial, often driven by misunderstanding or evasive, non-committal replies. We further observe that unsafe generations exhibit lower attention entropy than safe ones on average. This internal signature suggests a possible risk that models may over-focus on task solving while neglecting safety constraints. Our code and data are available at https://github.com/thu-coai/MIR-SafetyBench.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)は、複雑なマルチイメージ命令を扱う強力な推論能力を持つため、この進歩は新たな安全性リスクをもたらす可能性がある。 MIR-SafetyBenchは,9つのマルチイメージ関係の分類にまたがる2,676のインスタンスからなるマルチイメージ推論の安全性に着目した最初のベンチマークである。 MIR-SafetyBenchでは、より高度なマルチイメージ推論モデルの方が、より脆弱である可能性がある。攻撃の成功率以外にも、安全とラベル付けされた多くの応答は表面的なもので、誤解や回避的な非コミット的な応答によって引き起こされることが多い。さらに、安全でない世代は、安全でない世代よりも注意のエントロピーが低いことを観察する。この内部署名は、モデルが安全上の制約を無視しながらタスク解決に過度に注力するリスクを示唆している。私たちのコードとデータはhttps://github.com/thu-coai/MIR-SafetyBench.comで公開されています。

論文の概要: The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

関連論文リスト