Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection?
- URL: http://arxiv.org/abs/2407.13488v1
- Date: Thu, 18 Jul 2024 13:08:55 GMT
- Title: Similarity over Factuality: Are we making progress on multimodal out-of-context misinformation detection?
- Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis,
- Abstract summary: Out-of-context (OOC) misinformation poses a significant challenge in multimodal fact-checking.
Recent research in evidence-based OOC detection has seen a trend towards increasingly complex architectures.
We introduce a simple yet robust baseline, which assesses similarity between image-text pairs and external image and text evidence.
- Score: 15.66049149213069
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Out-of-context (OOC) misinformation poses a significant challenge in multimodal fact-checking, where images are paired with texts that misrepresent their original context to support false narratives. Recent research in evidence-based OOC detection has seen a trend towards increasingly complex architectures, incorporating Transformers, foundation models, and large language models. In this study, we introduce a simple yet robust baseline, which assesses MUltimodal SimilaritiEs (MUSE), specifically the similarity between image-text pairs and external image and text evidence. Our results demonstrate that MUSE, when used with conventional classifiers like Decision Tree, Random Forest, and Multilayer Perceptron, can compete with and even surpass the state-of-the-art on the NewsCLIPpings and VERITE datasets. Furthermore, integrating MUSE in our proposed "Attentive Intermediate Transformer Representations" (AITR) significantly improved performance, by 3.3% and 7.5% on NewsCLIPpings and VERITE, respectively. Nevertheless, the success of MUSE, relying on surface-level patterns and shortcuts, without examining factuality and logical inconsistencies, raises critical questions about how we define the task, construct datasets, collect external evidence and overall, how we assess progress in the field. We release our code at: https://github.com/stevejpapad/outcontext-misinfo-progress
Related papers
- Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.
We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.
We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - A Reality Check on Context Utilisation for Retrieval-Augmented Generation [44.54803681476863]
We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with real-world queries and contexts manually annotated for stance.
The dataset is based on the task of automated claim verification, for which automated retrieval of real-world evidence is crucial.
We show that synthetic datasets exaggerate context characteristics rare in real retrieved data, which leads to inflated context utilisation results.
arXiv Detail & Related papers (2024-12-22T14:16:38Z) - Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention [45.31956918333587]
In multimodal sentiment analysis, collecting text data is often more challenging than video or audio.
We have developed a robust model that integrates multimodal sentiment information, even in the absence of text modality.
arXiv Detail & Related papers (2024-10-19T07:59:41Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
Images via Online Resources [70.68526820807402]
A real image is re-purposed to support other narratives by misrepresenting its context and/or elements.
Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-context pairing.
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking.
arXiv Detail & Related papers (2021-11-30T19:36:20Z) - The Surprising Performance of Simple Baselines for Misinformation
Detection [4.060731229044571]
We examine the performance of a broad set of modern transformer-based language models.
We present our framework as a baseline for creating and evaluating new methods for misinformation detection.
arXiv Detail & Related papers (2021-04-14T16:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.