Related papers: CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection

CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection

URL: http://arxiv.org/abs/2505.23449v2
Date: Fri, 30 May 2025 11:34:43 GMT
Title: CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection
Authors: Fanxiao Li, Jiaying Wu, Canyuan He, Wei Zhou,
Abstract summary: We propose CMIE, a novel framework for detecting out-of-context (OOC) misinformation.<n>CMIE identifies the underlying coexistence between images and text, and selectively utilizes relevant evidence to enhance misinformation detection.
Score: 4.506980868306549
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal large language models (MLLMs) have demonstrated impressive capabilities in visual reasoning and text generation. While previous studies have explored the application of MLLM for detecting out-of-context (OOC) misinformation, our empirical analysis reveals two persisting challenges of this paradigm. Evaluating the representative GPT-4o model on direct reasoning and evidence augmented reasoning, results indicate that MLLM struggle to capture the deeper relationships-specifically, cases in which the image and text are not directly connected but are associated through underlying semantic links. Moreover, noise in the evidence further impairs detection accuracy. To address these challenges, we propose CMIE, a novel OOC misinformation detection framework that incorporates a Coexistence Relationship Generation (CRG) strategy and an Association Scoring (AS) mechanism. CMIE identifies the underlying coexistence relationships between images and text, and selectively utilizes relevant evidence to enhance misinformation detection. Experimental results demonstrate that our approach outperforms existing methods.

Related papers

HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection [8.609016163081744]
We propose HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking.<n>Our approach decomposes external consistency checking into a comprehensive engine pipeline, which integrates reranking and rewriting, apart from retrieval.<n>Our approach enables explanation for judgment, and achieves impressive performance with instruction tuning.
arXiv Detail & Related papers (2025-11-18T01:11:48Z)
Insight-A: Attribution-aware for Multimodal Misinformation Detection [14.02125134424451]
We present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation.<n>We devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning.<n>We also design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking.
arXiv Detail & Related papers (2025-11-17T02:33:36Z)
Explaining multimodal LLMs via intra-modal token interactions [55.27436637894534]
Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse vision-language tasks, yet their internal decision-making mechanisms remain insufficiently understood.<n>We propose enhancing interpretability by leveraging intra-modal interaction.
arXiv Detail & Related papers (2025-09-26T14:39:13Z)
ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations [65.11348389219887]
We introduce Dialectic-RAG (DRAG), a modular approach that evaluates retrieved information by comparing, contrasting, and resolving conflicting perspectives.<n>We show the impact of our framework both as an in-context learning strategy and for constructing demonstrations to instruct smaller models.
arXiv Detail & Related papers (2025-04-07T06:55:15Z)
Contradiction Detection in RAG Systems: Evaluating LLMs as Context Validators for Improved Information Consistency [0.6827423171182154]
Retrieval Augmented Generation (RAG) systems have emerged as a powerful method for enhancing large language models (LLMs) with up-to-date information.<n>RAG can sometimes surface documents containing contradictory information, particularly in rapidly evolving domains such as news.<n>This study presents a novel data generation framework to simulate different types of contradictions that may occur in the retrieval stage of a RAG system.
arXiv Detail & Related papers (2025-03-31T19:41:15Z)
Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies [0.0]
This paper conducts a comparison of approaches to detecting misinformation between text-based, multimodal, and agentic approaches.<n>We evaluate the effectiveness of fine-tuned models, zero-shot learning, and systematic fact-checking mechanisms in detecting misinformation across different topic domains.
arXiv Detail & Related papers (2025-03-02T04:31:42Z)
Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations [4.697267141773321]
Retrieval-augmented generation (RAG) has emerged as a critical mechanism in contemporary NLP to support Large Language Models (LLMs) in systematically accessing richer factual context. Recent studies have shown that LLMs still struggle to critically analyse RAG-based in-context information, a limitation that may lead to incorrect inferences and hallucinations. In this paper, we investigate how to elicit critical reasoning in RAG via contrastive explanations.
arXiv Detail & Related papers (2024-10-30T10:11:53Z)
LLM-Consensus: Multi-Agent Debate for Visual Misinformation Detection [26.84072878231029]
LLM-Consensus is a novel multi-agent debate system for misinformation detection.<n>Our framework enables explainable detection with state-of-the-art accuracy.
arXiv Detail & Related papers (2024-10-26T10:34:22Z)
Multimodal Misinformation Detection using Large Vision-Language Models [7.505532091249881]
Large language models (LLMs) have shown remarkable performance in various tasks. Few approaches consider evidence retrieval as part of misinformation detection. We propose a novel re-ranking approach for multimodal evidence retrieval.
arXiv Detail & Related papers (2024-07-19T13:57:11Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Improving Vision Anomaly Detection with the Guidance of Language Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view. We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.