Fugu-MT 論文翻訳(概要): GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

論文の概要: GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

arxiv url: http://arxiv.org/abs/2603.29295v1
Date: Tue, 31 Mar 2026 05:59:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.182348
Title: GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection
Title（参考訳）: GazeCLIP: ディープフェイク属性と検出のための適応強化細粒度言語プロンプトを用いたGaz-Guided CLIP
Authors: Yaning Zhang, Linlin Shen, Zitong Yu, Chunjie Ma, Zan Gao,
Abstract要約: 現在のディープフェイク属性やディープフェイク検出作業は、新しい生成方法への一般化が不十分である傾向にある。適応型きめ細粒度言語プロンプトを用いた新しい視線誘導型CLIPを提案する。拡散モデルや流れモデルのような新しい発電機上でのネットワークのDFAD性能を評価するために, 新規できめ細かなベンチマークを行う。 CLIPをベースとした視線認識モデルを導入し,顔偽造攻撃の一般化を図った。
参考スコア（独自算出の注目度）: 80.12497948980378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current deepfake attribution or deepfake detection works tend to exhibit poor generalization to novel generative methods due to the limited exploration in visual modalities alone. They tend to assess the attribution or detection performance of models on unseen advanced generators, coarsely, and fail to consider the synergy of the two tasks. To this end, we propose a novel gaze-guided CLIP with adaptive-enhanced fine-grained language prompts for fine-grained deepfake attribution and detection (DFAD). Specifically, we conduct a novel and fine-grained benchmark to evaluate the DFAD performance of networks on novel generators like diffusion and flow models. Additionally, we introduce a gaze-aware model based on CLIP, which is devised to enhance the generalization to unseen face forgery attacks. Built upon the novel observation that there are significant distribution differences between pristine and forged gaze vectors, and the preservation of the target gaze in facial images generated by GAN and diffusion varies significantly, we design a visual perception encoder to employ the inherent gaze differences to mine global forgery embeddings across appearance and gaze domains. We propose a gaze-aware image encoder (GIE) that fuses forgery gaze prompts extracted via a gaze encoder with common forged image embeddings to capture general attribution patterns, allowing features to be transformed into a more stable and common DFAD feature space. We build a language refinement encoder (LRE) to generate dynamically enhanced language embeddings via an adaptive-enhanced word selector for precise vision-language matching. Extensive experiments on our benchmark show that our model outperforms the state-of-the-art by 6.56% ACC and 5.32% AUC in average performance under the attribution and detection settings, respectively. Codes will be available on GitHub.
Abstract（参考訳）: 現在のディープフェイク属性やディープフェイク検出作業は、視覚的モーダルのみの探索が限られているため、新しい生成方法への一般化が不十分である傾向にある。彼らは目に見えない先進的な発電機のモデルの属性や検出性能を評価し、粗く2つのタスクの相乗効果を考慮できない傾向にある。そこで本研究では, 適応型微粒化言語プロンプトを付加した新しい視線誘導型CLIPを提案する。具体的には,拡散モデルや流れモデルのような新しい発電機上でのネットワークのDFAD性能を評価するために,新しい,きめ細かいベンチマークを行う。また,CLIPに基づく視線認識モデルを導入し,顔偽造攻撃の一般化を図った。 GANによる顔画像の視線保存と拡散は, プリスタンとフォージの視線ベクトル間に有意な分布差があることを新たな知見として, 視線差を生かした視覚認識エンコーダを設計し, 視線と視線領域をまたいだグローバルな視線埋め込みに利用した。本稿では, 一般的な帰属パターンを捉えるために, フォージェリ・ガウトプロンプトを, 共通のフォージェリ・エンコーダで抽出したガウト・ガウト・プロンプトを融合させて, より安定的で共通のDFAD特徴空間に変換できるガウト・アウェア・イメージ・エンコーダ(GIE)を提案する。我々は,適応型単語セレクタを用いて動的に拡張された言語埋め込みを生成するために,言語精細エンコーダ(LRE)を構築した。評価実験の結果,本モデルでは,属性および検出条件下での平均性能が6.56%,AUCが5.32%向上した。コードはGitHubで入手できる。

論文の概要: GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

関連論文リスト