Fugu-MT 論文翻訳(概要): ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification

論文の概要: ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification

arxiv url: http://arxiv.org/abs/2508.06623v1
Date: Fri, 08 Aug 2025 18:10:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.476122
Title: ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification
Title（参考訳）: ContextGuard-LVLM: きめ細かいコンテキスト整合性検証によるニュースの精度向上
Authors: Sihan Ma, Qiming Wu, Ruotong Jiang, Frank Burns,
Abstract要約: 伝統的なアプローチは、細粒度のクロスモーダルなコンテキスト整合性の問題に対処するのに不足している。先進的な視覚言語大モデルに基づく新しいフレームワークであるContextGuard-LVLMを提案する。我々のモデルは、強化されたあるいは敵対的な学習パラダイムによって一意に強化されている。
参考スコア（独自算出の注目度）: 2.012425476229879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of digital news media necessitates robust methods for verifying content veracity, particularly regarding the consistency between visual and textual information. Traditional approaches often fall short in addressing the fine-grained cross-modal contextual consistency (FCCC) problem, which encompasses deeper alignment of visual narrative, emotional tone, and background information with text, beyond mere entity matching. To address this, we propose ContextGuard-LVLM, a novel framework built upon advanced Vision-Language Large Models (LVLMs) and integrating a multi-stage contextual reasoning mechanism. Our model is uniquely enhanced through reinforced or adversarial learning paradigms, enabling it to detect subtle contextual misalignments that evade zero-shot baselines. We extend and augment three established datasets (TamperedNews-Ent, News400-Ent, MMG-Ent) with new fine-grained contextual annotations, including "contextual sentiment," "visual narrative theme," and "scene-event logical coherence," and introduce a comprehensive CTXT (Contextual Coherence) entity type. Extensive experiments demonstrate that ContextGuard-LVLM consistently outperforms state-of-the-art zero-shot LVLM baselines (InstructBLIP and LLaVA 1.5) across nearly all fine-grained consistency tasks, showing significant improvements in complex logical reasoning and nuanced contextual understanding. Furthermore, our model exhibits superior robustness to subtle perturbations and a higher agreement rate with human expert judgments on challenging samples, affirming its efficacy in discerning sophisticated forms of context detachment.
Abstract（参考訳）: デジタルニュースメディアの拡散は、特に視覚情報とテキスト情報の整合性に関して、コンテンツの妥当性を検証する堅牢な方法を必要とする。伝統的なアプローチは、単純な実体マッチングを超えて、視覚的物語、感情的なトーン、背景情報のテキストとのより深いアライメントを含む、細粒度のクロスモーダルなコンテキスト整合性(FCCC)問題に対処する際、しばしば不足する。そこで我々は,先進的な視覚言語大モデル(LVLM)に基づく新しいフレームワークであるContextGuard-LVLMを提案し,多段階の文脈推論機構を統合する。我々のモデルは、強化された、あるいは敵対的な学習パラダイムによって一意に強化され、ゼロショットベースラインを回避できる微妙な文脈的ミスアライメントを検出することができる。我々は,3つの確立されたデータセット(TamperedNews-Ent,News400-Ent,MMG-Ent)を拡張して拡張し,"コンテキスト感情","視覚的物語テーマ","シーン・イベント論理コヒーレンス"などのコンテキストアノテーションを導入し,包括的CTXT(Contextual Coherence)エンティティタイプを導入する。包括的実験により、ContextGuard-LVLMは、ほぼすべてのきめ細かい一貫性タスクにおいて、最先端のゼロショットLVLMベースライン(InstructBLIPとLLaVA 1.5)を一貫して上回り、複雑な論理的推論とニュアンスドな文脈的理解の大幅な改善を示す。さらに,本モデルでは,厳密な摂動に対する強い頑健さと,難解なサンプルに対する人間の専門家による判断との一致率が向上し,高度な文脈分離を識別する上での有効性が確認された。

論文の概要: ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification

関連論文リスト