Related papers: LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

URL: http://arxiv.org/abs/2402.11943v2
Date: Thu, 20 Jun 2024 20:20:30 GMT
Title: LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
Authors: Keyang Xuan, Li Yi, Fan Yang, Ruochen Wu, Yi R. Fung, Heng Ji,
Abstract summary: We propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation. Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively.
Score: 58.524237916836164
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential solution to this problem. Leveraging their proficiency in processing visual and textual information, LVLM demonstrates promising capabilities in recognizing complex information and exhibiting strong reasoning skills. In this paper, we first investigate the potential of LVLM on multimodal misinformation detection. We find that even though LVLM has a superior performance compared to LLMs, its profound reasoning may present limited power with a lack of evidence. Based on these observations, we propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation. LEMMA leverages LVLM intuition and reasoning capabilities while augmenting them with external knowledge to enhance the accuracy of misinformation detection. Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively.

Related papers

Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild [11.058848731627233]
Large language models (LLMs) have advanced information retrieval systems. LLMs often face knowledge conflicts between internal memory and retrievaled external information. We propose Swin-VIB, a novel framework that integrates a pipeline of variational information bottleneck models into adaptive augmentation of retrieved information.
arXiv Detail & Related papers (2025-04-17T14:40:31Z)
E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection [7.1939657372410375]
We present E2LVLM, a novel evidence-enhanced large vision-language model by adapting textual evidence in two levels. To address the scarcity of news domain datasets with both judgment and explanation, we generate a novel OOC multimodal instruction-following dataset. A multitude of experiments demonstrate that E2LVLM achieves superior performance than state-of-the-art methods.
arXiv Detail & Related papers (2025-02-12T04:25:14Z)
Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task [71.61879949813998]
In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence. Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities. Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding.
arXiv Detail & Related papers (2025-02-11T02:31:09Z)
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge [24.538839144639653]
Large Vision-Language Models (LVLMs) integrate separately pre-trained vision and language components. These models frequently encounter a core issue of "cognitive misalignment" between the vision encoder (VE) and the large language model (LLM)
arXiv Detail & Related papers (2024-11-25T18:33:14Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation [6.676337039829463]
We propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES) We leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval. Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method.
arXiv Detail & Related papers (2024-04-22T09:56:59Z)
MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation [15.343028838291078]
We propose MMIDR, a framework designed to teach LLMs in providing fluent and high-quality textual explanations for their decision-making process of multimodal misinformation. To convert multimodal misinformation into an appropriate instruction-following format, we present a data augmentation perspective and pipeline. Furthermore, we design an efficient knowledge distillation approach to distill the capability of proprietary LLMs in explaining multimodal misinformation into open-source LLMs.
arXiv Detail & Related papers (2024-03-21T06:47:28Z)
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs [72.49064988035126]
We propose an approach called MKS2, aimed at enhancing multimodal large language models (MLLMs) Specifically, we introduce the Modular Visual Memory, a component integrated into the internal blocks of LLMs, designed to store open-world visual information efficiently. Our experiments demonstrate that MKS2 substantially augments the reasoning capabilities of LLMs in contexts necessitating physical or commonsense knowledge.
arXiv Detail & Related papers (2023-11-27T12:29:20Z)
Can Large Language Models Understand Content and Propagation for Misinformation Detection: An Empirical Study [26.023148371263012]
Large Language Models (LLMs) have garnered significant attention for their powerful ability in natural language understanding and reasoning. We present a comprehensive empirical study to explore the performance of LLMs on misinformation detection tasks.
arXiv Detail & Related papers (2023-11-21T16:03:51Z)
RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation. Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.