Fugu-MT 論文翻訳(概要): Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

論文の概要: Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

arxiv url: http://arxiv.org/abs/2603.15800v1
Date: Mon, 16 Mar 2026 18:32:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:06.943167
Title: Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory
Title（参考訳）: 推論時間自己回帰記憶による多モード大言語モデルにおける文脈安全の進化
Authors: Ce Zhang, Jinxi He, Junyi He, Katia Sycara, Yaqi Xie,
Abstract要約: 本稿では,コンテキスト安全評価のためのベンチマークであるMM-SafetyBench++を提案する。各安全でない画像テキストペアに対して、最小限の変更で対応する安全なペアを構築する。トレーニング不要のフレームワークであるEchoSafeも導入しています。
参考スコア（独自算出の注目度）: 10.434155461003387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability to safety risks remains a pressing concern. While prior research primarily focuses on jailbreak defenses that detect and refuse explicitly unsafe inputs, such approaches often overlook contextual safety, which requires models to distinguish subtle contextual differences between scenarios that may appear similar but diverge significantly in safety intent. In this work, we present MM-SafetyBench++, a carefully curated benchmark designed for contextual safety evaluation. Specifically, for each unsafe image-text pair, we construct a corresponding safe counterpart through minimal modifications that flip the user intent while preserving the underlying contextual meaning, enabling controlled evaluation of whether models can adapt their safety behaviors based on contextual understanding. Further, we introduce EchoSafe, a training-free framework that maintains a self-reflective memory bank to accumulate and retrieve safety insights from prior interactions. By integrating relevant past experiences into current prompts, EchoSafe enables context-aware reasoning and continual evolution of safety behavior during inference. Extensive experiments on various multi-modal safety benchmarks demonstrate that EchoSafe consistently achieves superior performance, establishing a strong baseline for advancing contextual safety in MLLMs. All benchmark data and code are available at https://echosafe-mllm.github.io.
Abstract（参考訳）: MLLM(Multi-modal Large Language Models)は、広範囲の視覚的推論タスクにおいて顕著なパフォーマンスを達成したが、安全性のリスクに対する脆弱性は依然として懸念されている。以前の研究は主に、明示的に安全でない入力を検出して拒否するジェイルブレイク防御に焦点を当てていたが、そのようなアプローチは、しばしば文脈的安全性を見落とし、モデルが類似しているように見えるが、安全性の意図において著しく異なるシナリオ間の微妙な文脈的差異を区別する必要がある。本研究では,文脈的安全性評価のためのベンチマークであるMM-SafetyBench++を提案する。具体的には、各安全でない画像テキストペアに対して、ユーザ意図を反転させる最小限の修正を施し、基礎となる文脈的意味を保ちながら、モデルがコンテキスト的理解に基づいて安全行動に適応できるかどうかを制御した評価を可能にする。さらに,事前のインタラクションから安全性の洞察を蓄積し,取得するための自己反射型メモリバンクをメンテナンスする,トレーニング不要のフレームワークであるEchoSafeも紹介する。関連する過去の経験を現在のプロンプトに統合することにより、EchoSafeは、推論中のコンテキスト認識推論と、安全行動の継続的な進化を可能にする。様々なマルチモーダル安全性ベンチマークに対する大規模な実験により、EchoSafeは一貫して優れたパフォーマンスを達成し、MLLMのコンテキスト安全性を向上するための強力なベースラインを確立している。すべてのベンチマークデータとコードはhttps://echosafe-mllm.github.ioで公開されている。

論文の概要: Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

関連論文リスト