Fugu-MT 論文翻訳(概要): HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection

論文の概要: HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection

arxiv url: http://arxiv.org/abs/2605.26421v1
Date: Tue, 26 May 2026 01:20:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.519518
Title: HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection
Title（参考訳）: HydraPrompt:合成画像検出のための視覚言語モデルの適応的で非対称なフレームワーク
Authors: Senyuan Shi, Hao Tan, Zichang Tan, Shuhan Feng, Ajian Liu, Sergio Escalera, Jun Wan,
Abstract要約: 本稿では,カテゴリ中心を微粒な画像の手がかりと整合させて調整する非対称なプロンプトフレームワークを提案する。具体的には、一貫した代表パターンをキャプチャする一組のプロンプトを導入し、実際のコンテンツの統一アンカーとして機能する。 2)偽のカテゴリでは,サンプル適応型プロンプトを構築し,異なるサンプルから多様な手がかりを抽出し,偽画像の変動を適応的にモデル化する。
参考スコア（独自算出の注目度）: 52.11418741192251
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid evolution of generative models has precipitated a proliferation of fabricated content, posing significant challenges to existing Synthetic Image Detection (SID) methods. Capitalizing on advancements in vision-language models (e.g., CLIP), recent attempts have leveraged learnable textual prompts to identify synthetic images. However, they still leverage static prompt as a fixed boundary for real and fake images, failing to adapt to the varying types of forgery that emerge during inference. To overcome this issue, we propose **HydraPrompt**, an asymmetric prompting framework that dynamically adjusts the category centers by aligning with fine-grained image cues. Specifically, we propose an Asymmetric Prompt Adapter (**APA**): (1) for authentic category, we introduce a single set of prompts to capture the consistent representative patterns, which serves as a unified anchor for real content. While (2) for fake category, we construct sample-adaptive prompts that specialize in capturing diverse cues from different samples, enabling adaptive modeling of forgery image variations. To increase pronounced discriminability within different synthetic images, we further introduce a Conditional Supervised Contrastive (**CSC**) objective, which compacts the authentic representations while capturing fine-grained forgery clues. Extensive experiments on popular SID benchmarks demonstrate the state-of-the-art performance of our framework.
Abstract（参考訳）: 生成モデルの急速な進化は、既存の合成画像検出(SID)法に重大な課題を生んでいる。視覚言語モデル(例えばCLIP)の進歩に乗じて、最近の試みでは、学習可能なテキストプロンプトを活用して合成画像の識別を行っている。しかし、まだ静的プロンプトを実画像と偽画像の固定境界として利用しており、推論中に現れる様々な種類の偽造に適応できない。この問題を克服するために,非対称なプロンプトフレームワークである**HydraPrompt*を提案する。具体的には,(1) Asymmetric Prompt Adapter (**APA**): (1) 真のカテゴリに対して,一貫した代表パターンをキャプチャする単一のプロンプトを導入し,実際のコンテンツの統一アンカーとして機能する。 2)偽のカテゴリでは,サンプル適応型プロンプトを構築し,異なるサンプルから多様な手がかりを抽出し,偽画像の変動を適応的にモデル化する。異なる合成画像における顕著な識別性を高めるために,より微細な偽の手がかりを捕えながら,精度の高い表現をコンパクト化する条件付き監視コントラスト(**CSC**)の目的を導入する。一般的なSIDベンチマークに関する大規模な実験は、我々のフレームワークの最先端性能を実証している。

論文の概要: HydraPrompt: An Adaptive and Asymmetric Framework of Vision-Language Models for Synthetic Image Detection

関連論文リスト