RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
- URL: http://arxiv.org/abs/2311.09939v2
- Date: Thu, 7 Mar 2024 11:13:23 GMT
- Title: RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
- Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos,
Panagiotis C. Petrantonakis
- Abstract summary: We introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant.
RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%.
Our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3%.
- Score: 17.107961913114778
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Online misinformation is often multimodal in nature, i.e., it is caused by
misleading associations between texts and accompanying images. To support the
fact-checking process, researchers have been recently developing automatic
multimodal methods that gather and analyze external information, evidence,
related to the image-text pairs under examination. However, prior works assumed
all external information collected from the web to be relevant. In this study,
we introduce a "Relevant Evidence Detection" (RED) module to discern whether
each piece of evidence is relevant, to support or refute the claim.
Specifically, we develop the "Relevant Evidence Detection Directed Transformer"
(RED-DOT) and explore multiple architectural variants (e.g., single or
dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and
comparative experiments demonstrate that RED-DOT achieves significant
improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to
33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion
led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the
need for numerous evidence or multiple backbone encoders. We release our code
at: https://github.com/stevejpapad/relevant-evidence-detection
Related papers
- GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - Multimodal Misinformation Detection using Large Vision-Language Models [7.505532091249881]
Large language models (LLMs) have shown remarkable performance in various tasks.
Few approaches consider evidence retrieval as part of misinformation detection.
We propose a novel re-ranking approach for multimodal evidence retrieval.
arXiv Detail & Related papers (2024-07-19T13:57:11Z) - Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection? [15.66049149213069]
Out-of-context (OOC) misinformation poses a significant challenge in multimodal fact-checking.
Recent research in evidence-based OOC detection has seen a trend towards increasingly complex architectures.
We introduce a simple yet robust baseline, which assesses similarity between image-text pairs and external image and text evidence.
arXiv Detail & Related papers (2024-07-18T13:08:55Z) - Support or Refute: Analyzing the Stance of Evidence to Detect
Out-of-Context Mis- and Disinformation [13.134162427636356]
Mis- and disinformation online have become a major societal problem.
One common form of mis- and disinformation is out-of-context (OOC) information.
We propose a stance extraction network (SEN) that can extract the stances of different pieces of multi-modal evidence.
arXiv Detail & Related papers (2023-11-03T08:05:54Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - End-to-End Multimodal Fact-Checking and Explanation Generation: A
Challenging Dataset and Models [0.0]
We propose end-to-end multimodal fact-checking and explanation generation.
The goal is to assess the truthfulness of a claim by retrieving relevant evidence and predicting a truthfulness label.
To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims.
arXiv Detail & Related papers (2022-05-25T04:36:46Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
Images via Online Resources [70.68526820807402]
A real image is re-purposed to support other narratives by misrepresenting its context and/or elements.
Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-context pairing.
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking.
arXiv Detail & Related papers (2021-11-30T19:36:20Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.