MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection
- URL: http://arxiv.org/abs/2410.20140v1
- Date: Sat, 26 Oct 2024 10:34:22 GMT
- Title: MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection
- Authors: Kumud Lakara, Juil Sock, Christian Rupprecht, Philip Torr, John Collomosse, Christian Schroeder de Witt,
- Abstract summary: Out-of-context (OOC) use of images paired with misleading text creates false narratives.
Existing AI-driven detection systems lack explainability and require expensive fine-tuning.
We introduce MAD-Sherlock: a Multi-Agent Debate system for OOC Misinformation Detection.
Our framework enables explainable detection with state-of-the-art accuracy even without domain-specific fine-tuning.
- Score: 25.2179383241339
- License:
- Abstract: One of the most challenging forms of misinformation involves the out-of-context (OOC) use of images paired with misleading text, creating false narratives. Existing AI-driven detection systems lack explainability and require expensive fine-tuning. We address these issues with MAD-Sherlock: a Multi-Agent Debate system for OOC Misinformation Detection. MAD-Sherlock introduces a novel multi-agent debate framework where multimodal agents collaborate to assess contextual consistency and request external information to enhance cross-context reasoning and decision-making. Our framework enables explainable detection with state-of-the-art accuracy even without domain-specific fine-tuning. Extensive ablation studies confirm that external retrieval significantly improves detection accuracy, and user studies demonstrate that MAD-Sherlock boosts performance for both experts and non-experts. These results position MAD-Sherlock as a powerful tool for autonomous and citizen intelligence applications.
Related papers
- CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models [59.8529196670565]
CRAT is a novel multi-agent translation framework that leverages RAG and causality-enhanced self-reflection to address translation challenges.
Our results show that CRAT significantly improves translation accuracy, particularly in handling context-sensitive terms and emerging vocabulary.
arXiv Detail & Related papers (2024-10-28T14:29:11Z) - Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection [16.154903877808795]
Audit-LLM is a multi-agent log-based insider threat detection framework comprising three collaborative agents.
We propose a pair-wise Evidence-based Multi-agent Debate (EMAD) mechanism, where two independent Executors iteratively refine their conclusions through reasoning exchange to reach a consensus.
arXiv Detail & Related papers (2024-08-12T11:33:45Z) - MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - RAG-based Crowdsourcing Task Decomposition via Masked Contrastive Learning with Prompts [21.69333828191263]
We propose a retrieval-augmented generation-based crowdsourcing framework that reimagines task decomposition (TD) as event detection from the perspective of natural language understanding.
We present a Prompt-Based Contrastive learning framework for TD (PBCT), which incorporates a prompt-based trigger detector to overcome dependence.
Experiment results demonstrate the competitiveness of our method in both supervised and zero-shot detection.
arXiv Detail & Related papers (2024-06-04T08:34:19Z) - Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks.
We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture.
Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z) - SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context
Misinformation Detection [18.356648843815627]
Out-of-context (OOC) misinformation is one of the easiest and most effective ways to mislead audiences.
Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments.
We introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation.
arXiv Detail & Related papers (2024-03-05T18:04:59Z) - Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System [16.830182915504555]
Multi-agent debate system (MAD) imitates the process of human discussion in pursuit of truth.
It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds.
We propose a novel underlineMulti-underlineAgent underlineDebate with underlineKnowledge-underlineEnhanced framework to promote the system to find the solution.
arXiv Detail & Related papers (2023-12-08T06:22:12Z) - From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms.
We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation.
Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.