Fugu-MT 論文翻訳(概要): BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

論文の概要: BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

arxiv url: http://arxiv.org/abs/2603.05921v1
Date: Fri, 06 Mar 2026 05:22:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.105604
Title: BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation
Title（参考訳）: BlackMirror: インストラクション応答偏差によるテキスト間画像モデルのブラックボックスバックドア検出
Authors: Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Xilin Zhao, Xiaochun Cao, Qingming Huang,
Abstract要約: 本稿では,ブラックボックス設定下でのバックドアテキスト・ツー・イメージモデル検出の課題について検討する。新たな検出フレームワークであるBlackMirrorを導入している。
参考スコア（独自算出の注目度）: 117.54208768824869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates the challenging task of detecting backdoored text-to-image models under black-box settings and introduces a novel detection framework BlackMirror. Existing approaches typically rely on analyzing image-level similarity, under the assumption that backdoor-triggered generations exhibit strong consistency across samples. However, they struggle to generalize to recently emerging backdoor attacks, where backdoored generations can appear visually diverse. BlackMirror is motivated by an observation: across backdoor attacks, {only partial semantic patterns within the generated image are steadily manipulated, while the rest of the content remains diverse or benign. Accordingly, BlackMirror consists of two components: MirrorMatch, which aligns visual patterns with the corresponding instructions to detect semantic deviations; and MirrorVerify, which evaluates the stability of these deviations across varied prompts to distinguish true backdoor behavior from benign responses. BlackMirror is a general, training-free framework that can be deployed as a plug-and-play module in Model-as-a-Service (MaaS) applications. Comprehensive experiments demonstrate that BlackMirror achieves accurate detection across a wide range of attacks. Code is available at https://github.com/Ferry-Li/BlackMirror.
Abstract（参考訳）: 本稿では,ブラックボックス設定下でのバックドアテキスト・ツー・イメージモデル検出の課題について検討し,新たな検出フレームワークであるBlackMirrorを紹介した。既存のアプローチは、バックドアトリガー世代がサンプル間で強い一貫性を示すという仮定の下で、画像レベルの類似性を分析することに依存している。しかし、バックドア世代が視覚的に多様に見える最近のバックドア攻撃への一般化に苦慮している。 BlackMirrorは、バックドアアタック全体において、生成された画像内の部分的セマンティックパターンのみを着実に操作する一方で、残りのコンテンツは多様または良質なままである。したがって、BlackMirrorは2つのコンポーネントから構成される: MirrorMatchは視覚パターンを対応する命令と整列して意味的偏差を検出する。 BlackMirrorは、MaaS(Model-as-a-Service)アプリケーションでプラグイン・アンド・プレイモジュールとしてデプロイできる、一般的な、トレーニング不要のフレームワークである。総合的な実験により、BlackMirrorは広範囲の攻撃に対して正確な検出を達成している。コードはhttps://github.com/Ferry-Li/BlackMirror.comで入手できる。

論文の概要: BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

関連論文リスト