RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
- URL: http://arxiv.org/abs/2311.09939v2
- Date: Thu, 7 Mar 2024 11:13:23 GMT
- Title: RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
- Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos,
Panagiotis C. Petrantonakis
- Abstract summary: We introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant.
RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%.
Our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3%.
- Score: 17.107961913114778
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Online misinformation is often multimodal in nature, i.e., it is caused by
misleading associations between texts and accompanying images. To support the
fact-checking process, researchers have been recently developing automatic
multimodal methods that gather and analyze external information, evidence,
related to the image-text pairs under examination. However, prior works assumed
all external information collected from the web to be relevant. In this study,
we introduce a "Relevant Evidence Detection" (RED) module to discern whether
each piece of evidence is relevant, to support or refute the claim.
Specifically, we develop the "Relevant Evidence Detection Directed Transformer"
(RED-DOT) and explore multiple architectural variants (e.g., single or
dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and
comparative experiments demonstrate that RED-DOT achieves significant
improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to
33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion
led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the
need for numerous evidence or multiple backbone encoders. We release our code
at: https://github.com/stevejpapad/relevant-evidence-detection
Related papers
- Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment [51.96615529872665]
We propose CIEA, a novel multimodal retrieval approach that transforms both text and images in documents into a unified latent space.<n>We optimize CIEA using two complementary contrastive losses to ensure semantic integrity and effectively capture the complementary information contained in images.
arXiv Detail & Related papers (2026-01-08T04:02:49Z) - HiEAG: Evidence-Augmented Generation for Out-of-Context Misinformation Detection [8.609016163081744]
We propose HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking.<n>Our approach decomposes external consistency checking into a comprehensive engine pipeline, which integrates reranking and rewriting, apart from retrieval.<n>Our approach enables explanation for judgment, and achieves impressive performance with instruction tuning.
arXiv Detail & Related papers (2025-11-18T01:11:48Z) - WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research [73.58638285105971]
This paper tackles textbfopen-ended deep research (OEDR), a complex challenge where AI agents must synthesize vast web-scale information into insightful reports.<n>We introduce textbfWebWeaver, a novel dual-agent framework that emulates the human research process.<n>Our framework establishes a new state-of-the-art across major OEDR benchmarks, including DeepResearch Bench, DeepConsult, and DeepResearchGym.
arXiv Detail & Related papers (2025-09-16T17:57:21Z) - RAMA: Retrieval-Augmented Multi-Agent Framework for Misinformation Detection in Multimodal Fact-Checking [15.160356035522609]
RAMA is a novel retrieval-augmented multi-agent framework designed for verifying multimedia misinformation.<n> RAMA incorporates three core innovations: (1) strategic query formulation that transforms multimodal claims into precise web search queries; (2) cross-verification evidence aggregation from diverse, authoritative sources; and (3) a multi-agent ensemble architecture.
arXiv Detail & Related papers (2025-07-12T07:46:51Z) - Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning [54.56271651170667]
Existing methods often overfit to rigid templates and lack deep reasoning over deceptive content.<n>We introduce FakeVV, a large-scale benchmark comprising over 100,000 video-text pairs with fine-grained, interpretable annotations.<n>We also propose Fact-R1, a framework that integrates deep reasoning with collaborative rule-based reinforcement learning.
arXiv Detail & Related papers (2025-05-22T16:05:06Z) - Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering [60.062194349648195]
Document Visual Question Answering (DocVQA) faces dual challenges in processing lengthy multimodal documents.<n>Current document retrieval-augmented generation (DocRAG) methods remain limited by their text-centric approaches.<n>We introduce MMDocRAG, a comprehensive benchmark featuring 4,055 expert-annotated QA pairs with multi-page, cross-modal evidence chains.
arXiv Detail & Related papers (2025-05-22T09:52:57Z) - GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - Multimodal Misinformation Detection using Large Vision-Language Models [7.505532091249881]
Large language models (LLMs) have shown remarkable performance in various tasks.
Few approaches consider evidence retrieval as part of misinformation detection.
We propose a novel re-ranking approach for multimodal evidence retrieval.
arXiv Detail & Related papers (2024-07-19T13:57:11Z) - Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection? [15.66049149213069]
Out-of-context (OOC) misinformation poses a significant challenge in multimodal fact-checking.
Recent research in evidence-based OOC detection has seen a trend towards increasingly complex architectures.
We introduce a simple yet robust baseline, which assesses similarity between image-text pairs and external image and text evidence.
arXiv Detail & Related papers (2024-07-18T13:08:55Z) - Support or Refute: Analyzing the Stance of Evidence to Detect
Out-of-Context Mis- and Disinformation [13.134162427636356]
Mis- and disinformation online have become a major societal problem.
One common form of mis- and disinformation is out-of-context (OOC) information.
We propose a stance extraction network (SEN) that can extract the stances of different pieces of multi-modal evidence.
arXiv Detail & Related papers (2023-11-03T08:05:54Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - End-to-End Multimodal Fact-Checking and Explanation Generation: A
Challenging Dataset and Models [0.0]
We propose end-to-end multimodal fact-checking and explanation generation.
The goal is to assess the truthfulness of a claim by retrieving relevant evidence and predicting a truthfulness label.
To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims.
arXiv Detail & Related papers (2022-05-25T04:36:46Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
Images via Online Resources [70.68526820807402]
A real image is re-purposed to support other narratives by misrepresenting its context and/or elements.
Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-context pairing.
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking.
arXiv Detail & Related papers (2021-11-30T19:36:20Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.