Related papers: FakeNewsGPT4: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

FakeNewsGPT4: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

URL: http://arxiv.org/abs/2403.01988v1
Date: Mon, 4 Mar 2024 12:35:09 GMT
Title: FakeNewsGPT4: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs
Authors: Xuannan Liu and Peipei Li and Huaibo Huang and Zekun Li and Xing Cui and Jiahao Liang and Lixiong Qin and Weihong Deng and Zhaofeng He
Abstract summary: We propose a novel framework that augments Large Vision-Language Models with forgery-specific knowledge for manipulation reasoning. FakeNewsGPT4 achieves superior cross-domain performance compared to previous methods.
Score: 50.13829380113614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The massive generation of multimodal fake news exhibits substantial distribution discrepancies, prompting the need for generalized detectors. However, the insulated nature of training within specific domains restricts the capability of classical detectors to obtain open-world facts. In this paper, we propose FakeNewsGPT4, a novel framework that augments Large Vision-Language Models (LVLMs) with forgery-specific knowledge for manipulation reasoning while inheriting extensive world knowledge as complementary. Knowledge augmentation in FakeNewsGPT4 involves acquiring two types of forgery-specific knowledge, i.e., semantic correlation and artifact trace, and merging them into LVLMs. Specifically, we design a multi-level cross-modal reasoning module that establishes interactions across modalities for extracting semantic correlations. Concurrently, a dual-branch fine-grained verification module is presented to comprehend localized details to encode artifact traces. The generated knowledge is translated into refined embeddings compatible with LVLMs. We also incorporate candidate answer heuristics and soft prompts to enhance input informativeness. Extensive experiments on the public benchmark demonstrate that FakeNewsGPT4 achieves superior cross-domain performance compared to previous methods. Code will be available.

Related papers

Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor [32.34399128209528]
We study whether pre-trained text-to-image diffusion models can serve as instruction-aware visual encoders.<n>We find diffusion features are both rich in semantics and can encode strong image-text alignment.<n>We then investigate how to align these features with large language models and uncover a leakage phenomenon.
arXiv Detail & Related papers (2025-07-09T17:59:47Z)
LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection [14.687867348598035]
Large Vision Language Model (LVLM) has become an emerging tool for AI-generated content detection. We propose LAVID, a novel LVLMs-based ai-generated video detection with explicit knowledge enhancement. Our proposed pipeline automatically selects a set of explicit knowledge tools for detection, and then adaptively adjusts the structure prompt by self-rewriting.
arXiv Detail & Related papers (2025-02-20T19:34:58Z)
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance [67.26434607115392]
Large vision-language models (LVLMs) have achieved impressive results in various vision-language tasks. LVLMs suffer from hallucinations caused by language bias, leading to diminished focus on images and ineffective visual comprehension. We propose LACING to address the language bias of LVLMs with muLtimodal duAl-attention meChanIsm (MDA) aNd soft-image Guidance (IFG)
arXiv Detail & Related papers (2024-11-21T16:33:30Z)
LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection [34.984605500444324]
Large Language Models (LLMs) are known for their powerful natural language understanding and explanation generation abilities. We propose LLM-GAN, a novel framework that utilizes prompting mechanisms to enable an LLM to become Generator and Detector. Our results demonstrate LLM-GAN's effectiveness in both prediction performance and explanation quality.
arXiv Detail & Related papers (2024-09-03T11:06:45Z)
Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection [50.079690200471454]
Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. We propose a Dual-perspective Knowledge-guided Fake News Detection (DKFND) model, designed to enhance LLMs from both inside and outside perspectives.
arXiv Detail & Related papers (2024-07-12T03:15:01Z)
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge [76.45868419402265]
multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets. However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs. This paper proposes a new visual prompt approach to integrate fine-grained external knowledge, gleaned from specialized vision models, into MLLMs.
arXiv Detail & Related papers (2024-07-05T17:43:30Z)
Knowledge Graph-Enhanced Large Language Models via Path Selection [58.228392005755026]
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. LLMs are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. We propose a principled framework KELP with three stages to handle the above problems.
arXiv Detail & Related papers (2024-06-19T21:45:20Z)
Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding [9.2433070542025]
Large language models (LLMs) tend to inadequately integrate input context during text generation. We introduce a novel approach integrating contrastive decoding with adversarial irrelevant passages as negative samples.
arXiv Detail & Related papers (2024-05-04T20:38:41Z)
Hyperbolic Learning with Synthetic Captions for Open-World Detection [26.77840603264043]
We propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary descriptions automatically. Specifically, we bootstrap dense synthetic captions using pre-trained VLMs to provide rich descriptions on different regions in images. We also propose a novel hyperbolic vision-language learning approach to impose a hierarchy between visual and caption embeddings.
arXiv Detail & Related papers (2024-04-07T17:06:22Z)
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection [18.356648843815627]
Out-of-context (OOC) misinformation is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments. We introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation.
arXiv Detail & Related papers (2024-03-05T18:04:59Z)
How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study [27.23388511249688]
This paper investigates the layer-wise capability of large language models to encode knowledge. We leverage the powerful generative capability of ChatGPT to construct probing datasets. Experiments on conflicting and newly acquired knowledge show that LLMs prefer to encode more context knowledge in the upper layers.
arXiv Detail & Related papers (2024-02-25T11:15:42Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding. We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.