Fugu-MT 論文翻訳(概要): CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

論文の概要: CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

arxiv url: http://arxiv.org/abs/2604.09101v1
Date: Fri, 10 Apr 2026 08:33:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.777556
Title: CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
Title（参考訳）: CLIP-inspector:OOD TriggerインバージョンによるPrompt-Tuned CLIPのモデルレベルバックドア検出
Authors: Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal,
Abstract要約: 本稿では,CLIPモデルのためのバックドア検出手法であるCLIP-Inspector(CI)を紹介する。 CIは、モデルがバックドア動作を示すかどうかを決定するために、クラス毎に可能なトリガを再構築する。私たちは、CIの再構成されたトリガを使用して、正しくラベル付けされたトリガ入力を微調整することで、モデルを再調整できることを実証した。
参考スコア（独自算出の注目度）: 9.120160208679133
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI), a backdoor detection method designed for prompt-tuned CLIP models. Assuming white-box access to the delivered model and a pool of unlabeled OOD images, CI reconstructs possible triggers for each class to determine if the model exhibits backdoor behaviour or not. Additionally, we demonstrate that using CI's reconstructed trigger for fine-tuning on correctly labeled triggered inputs enables us to re-align the model and reduce backdoor effectiveness. Through extensive experiments across ten datasets and four backdoor attacks, we demonstrate that CI can reconstruct effective triggers in a single epoch using only 1,000 OOD images, achieving a 94% detection accuracy (47/50 models). Compared to adapted trigger-inversion baselines, CI yields a markedly higher AUROC score (0.973 vs 0.495/0.687), thus enabling the vetting and post-hoc repair of prompt-tuned CLIP models to ensure safe deployment.
Abstract（参考訳）: 限られたデータと計算リソースを持つ組織は、機械学習・アズ・ア・サービス(MLaaS)プロバイダにモデルトレーニングをアウトソースする傾向にある。この半正直な設定は、悪意のあるプロバイダがプロンプトチューニングプロトコルに従ってバックドアを移植し、アウト・オブ・ディストリビューション(OOD)データであっても、引き起こされた入力をアタッカー・チョゼンクラスに分類する、というセキュリティリスクを生じさせる。このようなバックドアはエンコーダを無傷で残し、エンコーダの腐敗に焦点を当てた既存の方法には検出できない。トレーニング前や推論中にデータをサニタイズする他のデータレベルのメソッドも、クリティカルな質問に答えられません。このモデルレベルの検証問題に対処するため,CLIPモデルを対象としたバックドア検出手法であるCLIP-Inspector(CI)を導入する。配信されたモデルへのホワイトボックスアクセスとラベルなしのOODイメージのプールを仮定すると、CIは各クラスに対して可能なトリガを再構築して、モデルがバックドア動作を示すかどうかを判断する。さらに、CIの再構成トリガを使用して、正しくラベル付けされたトリガ入力の微調整を行うことで、モデルを再調整し、バックドアの有効性を低減できることを示す。 10のデータセットにわたる広範な実験と4つのバックドア攻撃により、CIは1,000 OODイメージのみを使用して単一のエポックで効果的なトリガを再構築し、94%の精度(47/50モデル)を達成した。適応されたトリガ・インバージョンベースラインと比較して、CIはAUROCスコア(0.973対0.495/0.687)が著しく高くなり、プロンプトチューニングされたCLIPモデルの検証とポストホック修復が可能になった。

論文の概要: CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

関連論文リスト