LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
- URL: http://arxiv.org/abs/2402.11943v2
- Date: Thu, 20 Jun 2024 20:20:30 GMT
- Title: LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
- Authors: Keyang Xuan, Li Yi, Fan Yang, Ruochen Wu, Yi R. Fung, Heng Ji,
- Abstract summary: We propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation.
Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively.
- Score: 58.524237916836164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential solution to this problem. Leveraging their proficiency in processing visual and textual information, LVLM demonstrates promising capabilities in recognizing complex information and exhibiting strong reasoning skills. In this paper, we first investigate the potential of LVLM on multimodal misinformation detection. We find that even though LVLM has a superior performance compared to LLMs, its profound reasoning may present limited power with a lack of evidence. Based on these observations, we propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation. LEMMA leverages LVLM intuition and reasoning capabilities while augmenting them with external knowledge to enhance the accuracy of misinformation detection. Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively.
Related papers
- E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection [7.1939657372410375]
We present E2LVLM, a novel evidence-enhanced large vision-language model by adapting textual evidence in two levels.
To address the scarcity of news domain datasets with both judgment and explanation, we generate a novel OOC multimodal instruction-following dataset.
A multitude of experiments demonstrate that E2LVLM achieves superior performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-02-12T04:25:14Z) - Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task [71.61879949813998]
In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence.
Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities.
Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding.
arXiv Detail & Related papers (2025-02-11T02:31:09Z) - Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge [24.538839144639653]
Large Vision-Language Models (LVLMs) integrate separately pre-trained vision and language components.
These models frequently encounter a core issue of "cognitive misalignment" between the vision encoder (VE) and the large language model (LLM)
arXiv Detail & Related papers (2024-11-25T18:33:14Z) - LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation [6.676337039829463]
We propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES)
We leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval.
Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method.
arXiv Detail & Related papers (2024-04-22T09:56:59Z) - MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation [15.343028838291078]
We propose MMIDR, a framework designed to teach LLMs in providing fluent and high-quality textual explanations for their decision-making process of multimodal misinformation.
To convert multimodal misinformation into an appropriate instruction-following format, we present a data augmentation perspective and pipeline.
Furthermore, we design an efficient knowledge distillation approach to distill the capability of proprietary LLMs in explaining multimodal misinformation into open-source LLMs.
arXiv Detail & Related papers (2024-03-21T06:47:28Z) - Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
and Sharing in LLMs [72.49064988035126]
We propose an approach called MKS2, aimed at enhancing multimodal large language models (MLLMs)
Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently.
Our experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge.
arXiv Detail & Related papers (2023-11-27T12:29:20Z) - Explore the Potential of LLMs in Misinformation Detection: An Empirical Study [24.256183538265525]
Large Language Models (LLMs) have garnered significant attention for their powerful ability in natural language understanding and reasoning.
This study stands as the pioneering investigation into the understanding capabilities of multiple LLMs regarding both content and propagation across social media platforms.
arXiv Detail & Related papers (2023-11-21T16:03:51Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z) - On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation.
Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.