DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
- URL: http://arxiv.org/abs/2412.10510v2
- Date: Thu, 06 Feb 2025 13:27:38 GMT
- Title: DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
- Authors: Tobias Braun, Mark Rothermel, Marcus Rohrbach, Anna Rohrbach,
- Abstract summary: Dynamic Evidence-based FAct-checking with Multimodal Experts (DEFAME) is a zero-shot MLLM pipeline for open-domain, text-image claim verification.<n>DEFAME operates in a six-stage process, dynamically selecting the tools and search depth to extract and evaluate textual and visual evidence.<n> Evaluation on the popular benchmarks VERITE, AVerITeC, and MOCHEG shows that DEFAME surpasses all previous methods.
- Score: 35.952854524873246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of disinformation demands reliable and scalable fact-checking solutions. We present Dynamic Evidence-based FAct-checking with Multimodal Experts (DEFAME), a modular, zero-shot MLLM pipeline for open-domain, text-image claim verification. DEFAME operates in a six-stage process, dynamically selecting the tools and search depth to extract and evaluate textual and visual evidence. Unlike prior approaches that are text-only, lack explainability, or rely solely on parametric knowledge, DEFAME performs end-to-end verification, accounting for images in claims and evidence while generating structured, multimodal reports. Evaluation on the popular benchmarks VERITE, AVerITeC, and MOCHEG shows that DEFAME surpasses all previous methods, establishing itself as the new state-of-the-art fact-checking system for uni- and multimodal fact-checking. Moreover, we introduce a new benchmark, CLAIMREVIEW24+, featuring claims after the knowledge cutoff of GPT4o to avoid data leakage. Here, DEFAME drastically outperforms the GPT Chain-of-Thought baseline, demonstrating temporal generalizability and the potential for real-time fact-checking.
Related papers
- Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge.
Users pay for services based on advertised model capabilities.
providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs.
This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing [90.65399476233495]
We introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE)
RISEBench focuses on four key reasoning types: Temporal, Causal, Spatial, and Logical Reasoning.
We propose an evaluation framework that assesses Instruction Reasoning, Appearance Consistency, and Visual Plausibility with both human judges and an LMM-as-a-judge approach.
arXiv Detail & Related papers (2025-04-03T17:59:56Z) - Verification with Transparency: The TrendFact Benchmark for Auditable Fact-Checking via Natural Language Explanation [10.449165630417522]
We present TrendFact, the first Chinese fact-checking benchmark incorporating structured natural language explanations.
TrendFact comprises 7,643 carefully curated samples from trending social media content and professional fact-checking repositories.
It supports various forms of reasoning, including numerical, logical reasoning, and common sense verification.
arXiv Detail & Related papers (2024-10-19T15:25:19Z) - FIRE: Fact-checking with Iterative Retrieval and Verification [63.67320352038525]
FIRE is a novel framework that integrates evidence retrieval and claim verification in an iterative manner.
It achieves slightly better performance while reducing large language model (LLM) costs by an average of 7.6 times and search costs by 16.5 times.
These results indicate that FIRE holds promise for application in large-scale fact-checking operations.
arXiv Detail & Related papers (2024-10-17T06:44:18Z) - Multimodal Misinformation Detection using Large Vision-Language Models [7.505532091249881]
Large language models (LLMs) have shown remarkable performance in various tasks.
Few approaches consider evidence retrieval as part of misinformation detection.
We propose a novel re-ranking approach for multimodal evidence retrieval.
arXiv Detail & Related papers (2024-07-19T13:57:11Z) - MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking [0.283600654802951]
We present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal datasets.
We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths.
Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset.
arXiv Detail & Related papers (2024-07-18T01:33:20Z) - RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection [17.107961913114778]
We introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant.
RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%.
Our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3%.
arXiv Detail & Related papers (2023-11-16T14:43:45Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - End-to-End Multimodal Fact-Checking and Explanation Generation: A
Challenging Dataset and Models [0.0]
We propose end-to-end multimodal fact-checking and explanation generation.
The goal is to assess the truthfulness of a claim by retrieving relevant evidence and predicting a truthfulness label.
To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims.
arXiv Detail & Related papers (2022-05-25T04:36:46Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Logically at the Factify 2022: Multimodal Fact Verification [2.8914815569249823]
This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022.
Two baseline approaches are proposed and explored including an ensemble model and a multi-modal attention network.
Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set.
arXiv Detail & Related papers (2021-12-16T23:34:07Z) - Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
Images via Online Resources [70.68526820807402]
A real image is re-purposed to support other narratives by misrepresenting its context and/or elements.
Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-context pairing.
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking.
arXiv Detail & Related papers (2021-11-30T19:36:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.