Fugu-MT 論文翻訳(概要): Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

論文の概要: Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

arxiv url: http://arxiv.org/abs/2604.28022v1
Date: Thu, 30 Apr 2026 15:40:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:54.170947
Title: Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge
Title（参考訳）: DeepFakesは現実主義的すぎるか? 新しい挑戦としてセマンティック・ミスマッチを探る
Authors: Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa, Hugo Proença, Tiago Roxo,
Abstract要約: 現在のDeepFake検出シナリオは大部分がバイナリですが、データ操作はオーディオやビデオ、あるいはその両方によって異なります。そこで,本論文では,真正度間の意味レベル不整合を明示的にモデル化することで,4段階の定式化を拡張した新しい評価手法を提案する。我々は、FakeAVCelebデータセットを用いて、新しい現実的なDeepFake設定における最先端モデルの堅牢性を評価する。
参考スコア（独自算出の注目度）: 3.88230479224633
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current DeepFake detection scenarios are mostly binary, yet data manipulation can vary across audio, video, or both, whose variability is not captured in binary settings. Four-class audio-visual formulations address this by discriminating manipulation type, but introduce a unresolved problem: models may rely solely on data source integrity to detect DeepFakes without evaluating their semantic consistency. If the DeepFake origin is not in the data source but in its content, can semantic mismatch be assessed by the state-of-the-art? This paper proposes a new evaluation setup, extending the four-class formulation by explicitly modeling semantic-level inconsistency between authentic modalities with the introduction a new class: Real Audio-Real Video with Semantic Mismatch (RARV-SMM). We assess the robustness of state-of-the-art models in this new realistic DeepFake setting, using the FakeAVCeleb dataset, highlighting the limitations of existing approaches when faced with semantic mismatch data. We further introduce three RARV-SMM variants that expose distinct architectural vulnerabilities as audio-visual divergence increases. We also propose a semantic reinforcement strategy that incorporates the semantic mismatch class and ImageBind embeddings to improve DeepFake detection in both our proposed and state-of-the-art settings, on FakeAVCeleb and LAV-DF, paving the way to more realistic DeepFake detectors. The source code and data are available at https://github.com/.
Abstract（参考訳）: 現在のDeepFake検出シナリオは大部分がバイナリだが、データ操作はオーディオ、ビデオ、あるいは両方によって異なる可能性がある。モデルでは、セマンティックな一貫性を評価せずにDeepFakesを検出するために、データソースの完全性のみに依存している可能性がある。 DeepFakeの起源がデータソースではなくコンテンツにある場合、セマンティックミスマッチは最先端技術によって評価できますか? 本稿では,感性モダリティ間の意味レベル不整合を明示的にモデル化し,セマンティック・ミストッチ(RARV-SMM)を用いたリアルオーディオ・リアル・ビデオ(RARV-SMM)を導入することにより,新たな評価設定を提案する。我々は、FakeAVCelebデータセットを用いて、新しい現実的なDeepFake設定における最先端モデルの堅牢性を評価し、セマンティックミスマッチデータに直面する既存のアプローチの限界を強調する。さらにRARV-SMMの3つの変種を導入し、音声・視覚のばらつきが増大するにつれて、異なるアーキテクチャ上の脆弱性を明らかにする。また,FakeAVCelebとLAV-DFを用いたDeepFake検出において,セマンティックミスマッチクラスとImageBind埋め込みを組み込んだセマンティック強化戦略を提案し,より現実的なDeepFake検出を実現する。ソースコードとデータはhttps://github.com/.com/で公開されている。

論文の概要: Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

関連論文リスト