FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs
- URL: http://arxiv.org/abs/2403.01988v2
- Date: Tue, 6 Aug 2024 07:40:11 GMT
- Title: FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs
- Authors: Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He,
- Abstract summary: We propose FKA-Owl, a framework that leverages forgery-specific knowledge to augment Large Vision-Language Models (LVLMs)
Experiments on the public benchmark demonstrate that FKA-Owl achieves superior cross-domain performance compared to previous methods.
- Score: 48.32113486904612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The massive generation of multimodal fake news involving both text and images exhibits substantial distribution discrepancies, prompting the need for generalized detectors. However, the insulated nature of training restricts the capability of classical detectors to obtain open-world facts. While Large Vision-Language Models (LVLMs) have encoded rich world knowledge, they are not inherently tailored for combating fake news and struggle to comprehend local forgery details. In this paper, we propose FKA-Owl, a novel framework that leverages forgery-specific knowledge to augment LVLMs, enabling them to reason about manipulations effectively. The augmented forgery-specific knowledge includes semantic correlation between text and images, and artifact trace in image manipulation. To inject these two kinds of knowledge into the LVLM, we design two specialized modules to establish their representations, respectively. The encoded knowledge embeddings are then incorporated into LVLMs. Extensive experiments on the public benchmark demonstrate that FKA-Owl achieves superior cross-domain performance compared to previous methods. Code is publicly available at https://liuxuannan.github.io/FKA_Owl.github.io/.
Related papers
- Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance [67.26434607115392]
Large vision-language models (LVLMs) have achieved impressive results in various vision-language tasks.
LVLMs suffer from hallucinations caused by language bias, leading to diminished focus on images and ineffective visual comprehension.
We propose LACING to address the language bias of LVLMs with muLtimodal duAl-attention meChanIsm (MDA) aNd soft-image Guidance (IFG)
arXiv Detail & Related papers (2024-11-21T16:33:30Z) - LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection [34.984605500444324]
Large Language Models (LLMs) are known for their powerful natural language understanding and explanation generation abilities.
We propose LLM-GAN, a novel framework that utilizes prompting mechanisms to enable an LLM to become Generator and Detector.
Our results demonstrate LLM-GAN's effectiveness in both prediction performance and explanation quality.
arXiv Detail & Related papers (2024-09-03T11:06:45Z) - Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge [76.45868419402265]
multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets.
However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs.
This paper proposes a new visual prompt approach to integrate fine-grained external knowledge, gleaned from specialized vision models, into MLLMs.
arXiv Detail & Related papers (2024-07-05T17:43:30Z) - Knowledge Graph-Enhanced Large Language Models via Path Selection [58.228392005755026]
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications.
LLMs are known to generate factually inaccurate outputs, a.k.a. the hallucination problem.
We propose a principled framework KELP with three stages to handle the above problems.
arXiv Detail & Related papers (2024-06-19T21:45:20Z) - Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding [9.2433070542025]
Large language models (LLMs) tend to inadequately integrate input context during text generation.
We introduce a novel approach integrating contrastive decoding with adversarial irrelevant passages as negative samples.
arXiv Detail & Related papers (2024-05-04T20:38:41Z) - Hyperbolic Learning with Synthetic Captions for Open-World Detection [26.77840603264043]
We propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary descriptions automatically.
Specifically, we bootstrap dense synthetic captions using pre-trained VLMs to provide rich descriptions on different regions in images.
We also propose a novel hyperbolic vision-language learning approach to impose a hierarchy between visual and caption embeddings.
arXiv Detail & Related papers (2024-04-07T17:06:22Z) - SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context
Misinformation Detection [18.356648843815627]
Out-of-context (OOC) misinformation is one of the easiest and most effective ways to mislead audiences.
Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments.
We introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation.
arXiv Detail & Related papers (2024-03-05T18:04:59Z) - How Large Language Models Encode Context Knowledge? A Layer-Wise Probing
Study [27.23388511249688]
This paper investigates the layer-wise capability of large language models to encode knowledge.
We leverage the powerful generative capability of ChatGPT to construct probing datasets.
Experiments on conflicting and newly acquired knowledge show that LLMs prefer to encode more context knowledge in the upper layers.
arXiv Detail & Related papers (2024-02-25T11:15:42Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.