FakeNewsGPT4: Advancing Multimodal Fake News Detection through
Knowledge-Augmented LVLMs
- URL: http://arxiv.org/abs/2403.01988v1
- Date: Mon, 4 Mar 2024 12:35:09 GMT
- Title: FakeNewsGPT4: Advancing Multimodal Fake News Detection through
Knowledge-Augmented LVLMs
- Authors: Xuannan Liu and Peipei Li and Huaibo Huang and Zekun Li and Xing Cui
and Jiahao Liang and Lixiong Qin and Weihong Deng and Zhaofeng He
- Abstract summary: We propose a novel framework that augments Large Vision-Language Models with forgery-specific knowledge for manipulation reasoning.
FakeNewsGPT4 achieves superior cross-domain performance compared to previous methods.
- Score: 50.13829380113614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The massive generation of multimodal fake news exhibits substantial
distribution discrepancies, prompting the need for generalized detectors.
However, the insulated nature of training within specific domains restricts the
capability of classical detectors to obtain open-world facts. In this paper, we
propose FakeNewsGPT4, a novel framework that augments Large Vision-Language
Models (LVLMs) with forgery-specific knowledge for manipulation reasoning while
inheriting extensive world knowledge as complementary. Knowledge augmentation
in FakeNewsGPT4 involves acquiring two types of forgery-specific knowledge,
i.e., semantic correlation and artifact trace, and merging them into LVLMs.
Specifically, we design a multi-level cross-modal reasoning module that
establishes interactions across modalities for extracting semantic
correlations. Concurrently, a dual-branch fine-grained verification module is
presented to comprehend localized details to encode artifact traces. The
generated knowledge is translated into refined embeddings compatible with
LVLMs. We also incorporate candidate answer heuristics and soft prompts to
enhance input informativeness. Extensive experiments on the public benchmark
demonstrate that FakeNewsGPT4 achieves superior cross-domain performance
compared to previous methods. Code will be available.
Related papers
- Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection [47.01850264003063]
Few-Shot Fake News Detection aims to distinguish inaccurate news from real ones in extremely low-resource scenarios.
This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media.
We propose a Dual-perspective Augmented Fake News Detection model, designed to enhance Large Language Models.
arXiv Detail & Related papers (2024-07-12T03:15:01Z) - Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model [27.56988000960972]
We introduce a new framework based on a dual context of both domain-shared and class-specific contexts.
Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in Large Language Models.
We also formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens.
arXiv Detail & Related papers (2024-07-05T13:15:29Z) - Knowledge Graph-Enhanced Large Language Models via Path Selection [58.228392005755026]
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications.
LLMs are known to generate factually inaccurate outputs, a.k.a. the hallucination problem.
We propose a principled framework KELP with three stages to handle the above problems.
arXiv Detail & Related papers (2024-06-19T21:45:20Z) - COOL: Comprehensive Knowledge Enhanced Prompt Learning for Domain Adaptive Few-shot Fake News Detection [16.478355864072814]
We propose COOL, a comprehensive knedge enhanced prOmpt Learning method for domain adaptive few-shot FND.Owl.
Specifically, we propose a comprehensive knowledge extraction module to extract both structured and unstructured knowledge that are positively or negatively correlated with news from external sources.
arXiv Detail & Related papers (2024-06-16T09:41:25Z) - Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation [22.606764428110566]
We propose the Visual Implicit Knowledge Distillation Framework (VIKDF) for enriched dialogue generation in zero-resource contexts.
VIKDF comprises two main stages: knowledge distillation and knowledge integration.
Our experiments show that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues.
arXiv Detail & Related papers (2024-05-16T14:21:33Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Dual Semantic Knowledge Composed Multimodal Dialog Systems [114.52730430047589]
We propose a novel multimodal task-oriented dialog system named MDS-S2.
It acquires the context related attribute and relation knowledge from the knowledge base.
We also devise a set of latent query variables to distill the semantic information from the composed response representation.
arXiv Detail & Related papers (2023-05-17T06:33:26Z) - Lifelong Learning Natural Language Processing Approach for Multilingual
Data Classification [1.3999481573773074]
We propose a lifelong learning-inspired approach, which allows for fake news detection in multiple languages.
The ability of models to generalize the knowledge acquired between the analyzed languages was also observed.
arXiv Detail & Related papers (2022-05-25T10:34:04Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.