Fugu-MT 論文翻訳(概要): Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

論文の概要: Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

arxiv url: http://arxiv.org/abs/2506.05890v1
Date: Fri, 06 Jun 2025 08:59:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-09 17:28:43.393582
Title: Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Title（参考訳）: マルチモーダルメディアマニピュレーションの検出とグラウンド化のための一貫性学習の可能性
Authors: Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou, Zhen Lei,
Abstract要約: 本研究では,DGM4における偽造の微粒化認識能力を高めるために,CSCL (Contextual-Semantic Consistency Learning) という新しい手法を提案する。具体的に言うと、各モジュールは、トークンペアの異種情報から追加の監視を活用することで、一貫性機能を構築する。 DGM4の実験により、CSCLは、特に接地されたコンテンツに対して、新しい最先端のパフォーマンスを達成することが証明された。
参考スコア（独自算出の注目度）: 40.97921191007003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To tackle the threat of fake news, the task of detecting and grounding multi-modal media manipulation DGM4 has received increasing attention. However, most state-of-the-art methods fail to explore the fine-grained consistency within local content, usually resulting in an inadequate perception of detailed forgery and unreliable results. In this paper, we propose a novel approach named Contextual-Semantic Consistency Learning (CSCL) to enhance the fine-grained perception ability of forgery for DGM4. Two branches for image and text modalities are established, each of which contains two cascaded decoders, i.e., Contextual Consistency Decoder (CCD) and Semantic Consistency Decoder (SCD), to capture within-modality contextual consistency and across-modality semantic consistency, respectively. Both CCD and SCD adhere to the same criteria for capturing fine-grained forgery details. To be specific, each module first constructs consistency features by leveraging additional supervision from the heterogeneous information of each token pair. Then, the forgery-aware reasoning or aggregating is adopted to deeply seek forgery cues based on the consistency features. Extensive experiments on DGM4 datasets prove that CSCL achieves new state-of-the-art performance, especially for the results of grounding manipulated content. Codes and weights are avaliable at https://github.com/liyih/CSCL.
Abstract（参考訳）: フェイクニュースの脅威に対処するため、マルチモーダルメディア操作DGM4の検出とグラウンド化が注目されている。しかし、ほとんどの最先端の手法は、局所的な内容のきめ細かい一貫性を探求することができず、通常は詳細な偽造と信頼性の低い結果の認識が不十分である。本稿では,DGM4における偽造の微粒化認識能力を高めるために,CSCL (Contextual-Semantic Consistency Learning) という新しい手法を提案する。画像とテキストのモダリティのための2つのブランチが確立され、それぞれが2つのカスケードされたデコーダ、すなわちコンテキスト一貫性デコーダ(CCD)とセマンティック一貫性デコーダ(SCD)を含んでおり、それぞれがモダリティ内のコンテキスト整合性とモード間のセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティクスをキャプチャする。 CCDとSCDはどちらも細粒度偽造の詳細を捉えるのと同じ基準に準拠している。具体的に言うと、各モジュールは、トークンペアの異種情報から追加の監視を活用することで、一貫性機能を構築する。そして、この整合性特徴に基づいて、偽造的推論または集約を採用して、偽造的手がかりを深く探究する。 DGM4データセットに対する大規模な実験は、CSCLが新しい最先端のパフォーマンス、特に操作済みコンテンツのグラウンド化の結果を達成することを証明している。コードと重みはhttps://github.com/liyih/CSCL.comで検証可能である。

関連論文リスト

METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark [48.78602579128459]
本稿では,画像,ビデオ,音声,映像コンテンツにまたがる偽造検出のための統合ベンチマークMETERを紹介する。我々のデータセットは4つのトラックから構成されており、それぞれのトラックは実際のvsフェイク分類だけでなく、エビデンスチェーンに基づく説明も必要である。
論文参考訳（メタデータ） (2025-07-22T03:42:51Z)
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation [24.952907733127223]
クロスモーダルアライメント・蒸留(CAD)を用いたビデオディープフェイク検出のための一般的なフレームワークを提案する。 1)高レベルのセマンティックシンセシスにおける矛盾を識別するクロスモーダルアライメント(例:リップ音声ミスマッチ)、2)モダリティ特異的な法医学的痕跡(例:合成音声のスペクトル歪み)を保存しながらミスマッチを緩和するクロスモーダル蒸留(例:合成音声のスペクトル歪み)である。
論文参考訳（メタデータ） (2025-05-21T08:11:07Z)
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
異常検出(AD)は、通常のデータ分布から逸脱を識別する。本稿では,視覚エンコーダから抽出した画像コンテキストに基づいて,テキストエンコーダのプロンプトを条件付ける手法を提案する。提案手法は,14のデータセットにおいて,各メトリクスに対して2%から29%の性能向上を実現している。
論文参考訳（メタデータ） (2025-04-15T10:42:25Z)
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception [10.614437503578856]
本稿では,チャンキング品質を特に向上させるメタチャンキングフレームワークを提案する。我々は不確実性に基づく2つの適応的チャンキング手法、すなわちPerplexity ChunkingとMargin Sampling Chunkingを設計する。我々は,2段階の階層的要約生成プロセスと3段階のテキストチャンク書き換え手順を含むグローバル情報補償機構を確立する。
論文参考訳（メタデータ） (2024-10-16T17:59:32Z)
Dynamic Weighted Combiner for Mixed-Modal Image Retrieval [8.683144453481328]
フレキシブル検索パラダイムとしてのMixed-Modal Image Retrieval (MMIR) が注目されている。以前のアプローチは常に2つの重要な要因のため、限られたパフォーマンスを達成する。以上の課題に対処するための動的重み付け結合器(DWC)を提案する。
論文参考訳（メタデータ） (2023-12-11T07:36:45Z)
Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
マルチモーダルフェイクメディア(DGM4)の新たな研究課題について述べる。 DGM4は、マルチモーダルメディアの真正性を検出するだけでなく、操作されたコンテンツも検出することを目的としている。本稿では,異なるモーダル間のきめ細かい相互作用を完全に捉えるために,新しい階層型マルチモーダルマニピュレーションrEasoning tRansformer(HAMMER)を提案する。
論文参考訳（メタデータ） (2023-09-25T15:05:46Z)
Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection [53.48346699224921]
マルチメディアコンテンツによる噂を検出するために,知識誘導型二元整合ネットワークを提案する。 2つの一貫性検出ツールを使用して、クロスモーダルレベルとコンテント知識レベルの不整合を同時にキャプチャする。また、異なる視覚的モダリティ条件下で頑健なマルチモーダル表現学習を可能にする。
論文参考訳（メタデータ） (2023-06-03T15:32:20Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
マルチモーダル特徴の融合と復号を導くために,クロスモーダル・セマンティックスをマイニングする手法を提案する。具体的には,(1)全周減衰核融合(AF),(2)粗大デコーダ(CFD),(3)多層自己超越からなる新しいネットワークXMSNetを提案する。
論文参考訳（メタデータ） (2023-05-17T14:30:11Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。