Fugu-MT 論文翻訳(概要): ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

論文の概要: ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

arxiv url: http://arxiv.org/abs/2604.19083v1
Date: Tue, 21 Apr 2026 04:52:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.624067
Title: ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety
Title（参考訳）: ProjLens:マルチモーダルモデル安全性におけるプロジェクタの役割を明らかにする
Authors: Kun Wang, Cheng Qian, Miao Yu, Lilan Peng, Liang Lin, Jiaming Zhang, Tianyu Zhang, Yu Cheng, Yang Wang,
Abstract要約: MLLM(Multimodal Large Language Models)は、クロスモーダルな理解と生成において大きな成功を収めていますが、そのデプロイは重大な安全性の脆弱性によって脅かされています。本稿では,MLLMのバックドアを復号化するための解釈可能性フレームワークであるProjLensを提案する。
参考スコア（独自算出の注目度）: 54.4092272526747
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated the feasibility of backdoors in MLLMs via fine-tuning data poisoning to manipulate inference, the underlying mechanisms of backdoor attacks remain opaque, complicating the understanding and mitigation. To bridge this gap, we propose ProjLens, an interpretability framework designed to demystify MLLMs backdoors. We first establish that normal downstream task alignment--even when restricted to projector fine--tuning--introduces vulnerability to backdoor injection, whose activation mechanism is different from that observed in text-only LLMs. Through extensive experiments across four backdoor variants, we uncover:(1) Low-Rank Structure: Backdoor injection updates appear overall full-rank and lack dedicated ``trigger neurons'', but the backdoor-critical parameters are encoded within a low-rank subspace of the projector;(2) Activation Mechanism: Both clean and poisoned embedding undergoes a semantic shift toward a shared direction aligned with the backdoor target, but the shifting magnitude scales linearly with the input norm, resulting in the distinct backdoor activation on poisoned samples. Our code is available at: https://anonymous.4open.science/r/ProjLens-8FD7
Abstract（参考訳）: MLLM(Multimodal Large Language Models)は、クロスモーダルな理解と生成において大きな成功を収めていますが、そのデプロイは重大な安全性の脆弱性によって脅かされています。従来の研究は、推論を操作するための微調整データ中毒によるMLLMのバックドアの実現可能性を示しているが、バックドア攻撃の基盤となるメカニズムは不透明であり、理解と緩和を複雑にしている。このギャップを埋めるために,MLLMのバックドアをデミスタライズするために設計された解釈可能性フレームワークであるProjLensを提案する。まず,通常の下流タスクアライメントがプロジェクタの微調整に制限された場合でも,テキストのみのLLMと異なる活性化機構を持つバックドアインジェクションの脆弱性が生じることを確認した。低ランク構造: バックドアインジェクションの更新は、全体的なフルランクと専用の ''トリガーニューロン'' の欠如を示すが、バックドアクリティカルパラメータはプロジェクターの低ランクサブ空間内にエンコードされている。私たちのコードは、https://anonymous.4open.science/r/ProjLens-8FD7で利用可能です。

論文の概要: ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

関連論文リスト