Fugu-MT 論文翻訳(概要): Towards Harmless Multimodal Assistants with Blind Preference Optimization

論文の概要: Towards Harmless Multimodal Assistants with Blind Preference Optimization

arxiv url: http://arxiv.org/abs/2503.14189v1
Date: Tue, 18 Mar 2025 12:02:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-03-19 16:29:12.857684
Title: Towards Harmless Multimodal Assistants with Blind Preference Optimization
Title（参考訳）: ブラインド優先最適化を用いたハームレスマルチモーダルアシスタントの実現に向けて
Authors: Yongqi Li, Lu Yang, Jian Wang, Runyang You, Wenjie Li, Liqiang Nie,
Abstract要約: MLLM(Multimodal Large Language Models)は、マルチモーダル理解、推論、相互作用において印象的な能力を示す。 MLLMと人間の嗜好の整合における選好最適化の有効性から,MLLMの安全関連選好データが必要である。我々は、無害なマルチモーダルアシスタントに対してMMSafe-PO選好データセットを構築し、マルチモーダル命令、会話形式、人間のフィードバックからのランク付けされたペア応答を特徴付ける。
参考スコア（独自算出の注目度）: 49.044737689613164
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. Given the extensive applications of MLLMs, the associated safety issues have become increasingly critical. Due to the effectiveness of preference optimization in aligning MLLMs with human preferences, there is an urgent need for safety-related preference data for MLLMs. To address this, we construct the MMSafe-PO preference dataset towards harmless multimodal assistants, featuring multimodal instructions, the conversational format, and ranked paired responses from human feedback. We also identify two insightful observations: modality co-defense and modality cheating, which illustrate that MLLMs possess a certain level of inherent defense while still presenting unique safety challenges. Based on these observations, we propose the Blind Preference Optimization (BPO) approach. Comprehensive experiments on three benchmarks show that BPO effectively enhances the safety capabilities of MLLMs. Notably, BPO significantly improves the safety rate of the base MLLM by 45.0%, outperforming the DPO approach. Additionally, applying BPO to the MMSafe-PO dataset greatly reduces the base MLLM's unsafe rate on other safety benchmarks (14.5% on MM-SafetyBench and 82.9% on HarmEval, demonstrating the effectiveness and robustness of both the dataset and the approach. We release code and data at https://lu-yang666.github.io/MMsafe-PO-Web/.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)は、マルチモーダル理解、推論、相互作用において印象的な能力を示す。 MLLMの広範な応用を考えると、関連する安全問題はますます重要になっている。 MLLMと人間の嗜好の整合における選好最適化の有効性から,MLLMの安全関連選好データが必要である。そこで本研究では,無害なマルチモーダルアシスタントを対象としたMMSafe-PO選好データセットを構築し,マルチモーダル命令,会話形式,人間からのフィードバックからランク付けされたペア応答を特徴とする。モダリティ・コ・ディフェンス(英語版)とモダリティ・不正行為(英語版)という2つの洞察に富んだ観察は、MLLMが固有の防御レベルを持ちながら、ユニークな安全課題を提示していることを示している。これらの観測に基づいて,Blind Preference Optimization (BPO) アプローチを提案する。 3つのベンチマークの総合的な実験により、BPOはMLLMの安全性を効果的に向上することが示された。特に、BPOはベースMLLMの安全性を45.0%向上させ、DPOアプローチよりも優れる。さらに、BPOをMMSafe-POデータセットに適用すると、MLLMが他の安全ベンチマークで安全でない(MM-SafetyBenchで14.5%、HarmEvalで82.9%)ため、データセットとアプローチの両方の有効性と堅牢性を示すことができる。コードとデータはhttps://lu-yang666.github.io/MMsafe-PO-Web/で公開しています。

論文の概要: Towards Harmless Multimodal Assistants with Blind Preference Optimization

関連論文リスト