MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning
- URL: http://arxiv.org/abs/2510.17590v1
- Date: Mon, 20 Oct 2025 14:40:26 GMT
- Title: MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning
- Authors: Mir Nafis Sharear Shopnil, Sharad Duwal, Abhishek Tyagi, Adiba Mahbub Proma,
- Abstract summary: We present MIRAGE, an inference-time, model-pluggable agentic framework that decomposes multimodal verification into four sequential modules.<n> visual veracity assessment detects AI-generated images, cross-modal consistency analysis identifies out-of-context repurposing, retrieval-augmented factual checking grounds claims in web evidence.<n>MIRAGE orchestrates vision-language model reasoning with targeted web retrieval, outputs structured and citation-linked rationales.
- Score: 0.6475163438744868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Misinformation spreads across web platforms through billions of daily multimodal posts that combine text and images, overwhelming manual fact-checking capacity. Supervised detection models require domain-specific training data and fail to generalize across diverse manipulation tactics. We present MIRAGE, an inference-time, model-pluggable agentic framework that decomposes multimodal verification into four sequential modules: visual veracity assessment detects AI-generated images, cross-modal consistency analysis identifies out-of-context repurposing, retrieval-augmented factual checking grounds claims in web evidence through iterative question generation, and a calibrated judgment module integrates all signals. MIRAGE orchestrates vision-language model reasoning with targeted web retrieval, outputs structured and citation-linked rationales. On MMFakeBench validation set (1,000 samples), MIRAGE with GPT-4o-mini achieves 81.65% F1 and 75.1% accuracy, outperforming the strongest zero-shot baseline (GPT-4V with MMD-Agent at 74.0% F1) by 7.65 points while maintaining 34.3% false positive rate versus 97.3% for a judge-only baseline. Test set results (5,000 samples) confirm generalization with 81.44% F1 and 75.08% accuracy. Ablation studies show visual verification contributes 5.18 F1 points and retrieval-augmented reasoning contributes 2.97 points. Our results demonstrate that decomposed agentic reasoning with web retrieval can match supervised detector performance without domain-specific training, enabling misinformation detection across modalities where labeled data remains scarce.
Related papers
- How well are open sourced AI-generated image detection models out-of-the-box: A comprehensive benchmark study [5.740397289924559]
No universal winner exists, with detector rankings exhibiting substantial instability.<n>Our findings challenge the one-size-fits-all'' detector paradigm.
arXiv Detail & Related papers (2026-02-08T04:36:13Z) - Scaling Trends for Multi-Hop Contextual Reasoning in Mid-Scale Language Models [0.0]
We present a controlled study of multi-hop contextual reasoning in large language models.<n>We show that multi-agent systems show the inverse pattern, achieving up to 80% on reasoning tasks where rule-based methods fail.
arXiv Detail & Related papers (2026-01-06T20:18:55Z) - Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - Ensemble YOLO Framework for Multi-Domain Mitotic Figure Detection in Histopathology Images [0.7541656202645494]
Two modern one-stage detectors, YOLOv5 and YOLOv8, were trained on MIDOG++, CMC, and CCMCT datasets.<n>YOLOv5 achieved higher precision (84.3%), while YOLOv8 offered improved recall (82.6%)<n>Our ensemble ranked 5th with an F1 score of 79.2%, precision of 73.6%, and recall of 85.8%, confirming that the proposed strategy generalizes effectively across unseen test data.
arXiv Detail & Related papers (2025-09-03T02:43:02Z) - MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection [84.75972919995398]
This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles.<n>The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent, and (iv) a web-scraped data analyzer.<n>Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches.
arXiv Detail & Related papers (2025-08-13T19:14:48Z) - OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation [81.26818054877658]
MMMG is a comprehensive benchmark for multimodal generation across 4 modality combinations.<n>It is highly aligned with human evaluation, achieving an average agreement of 94.3%.<n>GPT Image achieves 78.3% accuracy for image generation, but falls short on multimodal reasoning and interleaved generation.
arXiv Detail & Related papers (2025-05-23T08:21:28Z) - Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data [0.0]
We propose a multi-agent framework that identifies by disentangling each statement as fact or opinion.<n>By combining enhanced detection accuracy with interpretable explanations, this approach promotes accountability in modern language models.
arXiv Detail & Related papers (2025-03-01T05:27:54Z) - SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models [0.16385815610837165]
SelfCheckAgent is a novel framework integrating three different agents.<n>These agents provide a robust multi-dimensional approach to hallucination detection.<n>The framework also incorporates a triangulation strategy, which increases the strengths of the SelfCheckAgent.
arXiv Detail & Related papers (2025-02-03T20:42:32Z) - MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks.
MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z) - Weakly Supervised Veracity Classification with LLM-Predicted Credibility Signals [4.895830603263421]
Pastel is a weakly supervised approach that leverages large language models to extract credibility signals from web content.
We study the association between credibility signals and veracity, and perform a study showing the impact of each signal on model performance.
arXiv Detail & Related papers (2023-09-14T11:06:51Z) - Improving Visual Grounding by Encouraging Consistent Gradient-based
Explanations [58.442103936918805]
We show that Attention Mask Consistency produces superior visual grounding results than previous methods.
AMC is effective, easy to implement, and is general as it can be adopted by any vision-language model.
arXiv Detail & Related papers (2022-06-30T17:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.