Related papers: Insight-A: Attribution-aware for Multimodal Misinformation Detection

Insight-A: Attribution-aware for Multimodal Misinformation Detection

URL: http://arxiv.org/abs/2511.21705v1
Date: Mon, 17 Nov 2025 02:33:36 GMT
Title: Insight-A: Attribution-aware for Multimodal Misinformation Detection
Authors: Junjie Wu, Yumeng Fu, Chen Gong, Guohong Fu,
Abstract summary: We present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation.<n>We devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning.<n>We also design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking.
Score: 14.02125134424451
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: AI-generated content (AIGC) technology has emerged as a prevalent alternative to create multimodal misinformation on social media platforms, posing unprecedented threats to societal safety. However, standard prompting leverages multimodal large language models (MLLMs) to identify the emerging misinformation, which ignores the misinformation attribution. To this end, we present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation. Insight-A makes two efforts: I) attribute misinformation to forgery sources, and II) an effective pipeline with hierarchical reasoning that detects distortions across modalities. Specifically, to attribute misinformation to forgery traces based on generation patterns, we devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning. Meanwhile, to reduce the subjectivity of human-annotated prompts, automatic attribution-debiased prompting (ADP) is used for task adaptation on MLLMs. Additionally, we design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking. Extensive experiments demonstrate the superiority of our proposal and provide a new paradigm for multimodal misinformation detection in the era of AIGC.

Related papers

MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection [8.06079393106578]
Multimodal misinformation floods on various social media, and continues to evolve in the era of AI-generated content (AIGC)<n>Recent studies leverage general-purpose multimodal large language models (MLLMs) to achieve remarkable results in detection.<n>We propose MMD-Thinker, a two-stage framework for multimodal misinformation detection through adaptive multi-dimensional thinking.
arXiv Detail & Related papers (2025-11-17T11:04:30Z)
IAD-GPT: Advancing Visual Knowledge in Multimodal Large Language Model for Industrial Anomaly Detection [70.02774285130238]
This paper explores the combination of rich text semantics with both image-level and pixel-level information from images.<n>We propose IAD-GPT, a novel paradigm based on MLLMs for Industrial Anomaly Detection.<n>Experiments on MVTec-AD and VisA datasets demonstrate our state-of-the-art performance.
arXiv Detail & Related papers (2025-10-16T02:48:05Z)
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline [56.790045049514326]
Two major forms of deception dominate: human-crafted misinformation and AI-generated content.<n>We propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception.<n>UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines.
arXiv Detail & Related papers (2025-09-30T09:26:32Z)
Explaining multimodal LLMs via intra-modal token interactions [55.27436637894534]
Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse vision-language tasks, yet their internal decision-making mechanisms remain insufficiently understood.<n>We propose enhancing interpretability by leveraging intra-modal interaction.
arXiv Detail & Related papers (2025-09-26T14:39:13Z)
CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection [14.140095146756996]
We propose CMIE, a novel framework for detecting out-of-context (OOC) misinformation.<n>CMIE identifies the underlying coexistence between images and text, and selectively utilizes relevant evidence to enhance misinformation detection.
arXiv Detail & Related papers (2025-05-29T13:56:21Z)
Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies [0.0]
This paper conducts a comparison of approaches to detecting misinformation between text-based, multimodal, and agentic approaches.<n>We evaluate the effectiveness of fine-tuned models, zero-shot learning, and systematic fact-checking mechanisms in detecting misinformation across different topic domains.
arXiv Detail & Related papers (2025-03-02T04:31:42Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency. Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations. We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.