Synthetic Misinformers: Generating and Combating Multimodal
Misinformation
- URL: http://arxiv.org/abs/2303.01217v1
- Date: Thu, 2 Mar 2023 12:59:01 GMT
- Title: Synthetic Misinformers: Generating and Combating Multimodal
Misinformation
- Authors: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos,
Panagiotis C. Petrantonakis
- Abstract summary: multimodal misinformation detection (MMD) detects whether the combination of an image and its accompanying text could mislead or misinform.
We show that our proposed CLIP-based Named Entity Swapping can lead to MMD models that surpass other OOC and NEI Misinformers in terms of multimodal accuracy.
- Score: 11.696058634552147
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: With the expansion of social media and the increasing dissemination of
multimedia content, the spread of misinformation has become a major concern.
This necessitates effective strategies for multimodal misinformation detection
(MMD) that detect whether the combination of an image and its accompanying text
could mislead or misinform. Due to the data-intensive nature of deep neural
networks and the labor-intensive process of manual annotation, researchers have
been exploring various methods for automatically generating synthetic
multimodal misinformation - which we refer to as Synthetic Misinformers - in
order to train MMD models. However, limited evaluation on real-world
misinformation and a lack of comparisons with other Synthetic Misinformers
makes difficult to assess progress in the field. To address this, we perform a
comparative study on existing and new Synthetic Misinformers that involves (1)
out-of-context (OOC) image-caption pairs, (2) cross-modal named entity
inconsistency (NEI) as well as (3) hybrid approaches and we evaluate them
against real-world misinformation; using the COSMOS benchmark. The comparative
study showed that our proposed CLIP-based Named Entity Swapping can lead to MMD
models that surpass other OOC and NEI Misinformers in terms of multimodal
accuracy and that hybrid approaches can lead to even higher detection accuracy.
Nevertheless, after alleviating information leakage from the COSMOS evaluation
protocol, low Sensitivity scores indicate that the task is significantly more
challenging than previous studies suggested. Finally, our findings showed that
NEI-based Synthetic Misinformers tend to suffer from a unimodal bias, where
text-only MMDs can outperform multimodal ones.
Related papers
- LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models [55.903148392998965]
We introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities.
The benchmark includes coarse-grained judgment and multiple-choice questions, as well as fine-grained anomaly selection and explanation tasks.
We evaluate 22 open-source LMMs and 6 closed-source models on LOKI, highlighting their potential as synthetic data detectors and also revealing some limitations in the development of LMM capabilities.
arXiv Detail & Related papers (2024-10-13T05:26:36Z) - RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI [5.355943545567233]
Alzheimer's Disease (AD) is an irreversible neurodegenerative disorder that often progresses from Mild Cognitive Impairment (MCI)
We introduce GFE-Mamba, a classifier based on Generative Feature Extraction (GFE)
It integrates data from assessment scales, MRI, and PET, enabling deeper multimodal fusion.
Our experimental results demonstrate that the GFE-Mamba model is effective in predicting the conversion from MCI to AD.
arXiv Detail & Related papers (2024-07-22T15:22:33Z) - Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - Cross-head mutual Mean-Teaching for semi-supervised medical image
segmentation [6.738522094694818]
Semi-supervised medical image segmentation (SSMIS) has witnessed substantial advancements by leveraging limited labeled data and abundant unlabeled data.
Existing state-of-the-art (SOTA) methods encounter challenges in accurately predicting labels for the unlabeled data.
We propose a novel Cross-head mutual mean-teaching Network (CMMT-Net) incorporated strong-weak data augmentation.
arXiv Detail & Related papers (2023-10-08T09:13:04Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition [73.80088682784587]
"Multimodal Generalization" (MMG) aims to study how systems can generalize when data from certain modalities is limited or even completely missing.
MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications.
New fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal loss for better few-shot performance.
arXiv Detail & Related papers (2023-05-12T03:05:40Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - Cross-Modality Neuroimage Synthesis: A Survey [71.27193056354741]
Multi-modality imaging improves disease diagnosis and reveals distinct deviations in tissues with anatomical properties.
The existence of completely aligned and paired multi-modality neuroimaging data has proved its effectiveness in brain research.
An alternative solution is to explore unsupervised or weakly supervised learning methods to synthesize the absent neuroimaging data.
arXiv Detail & Related papers (2022-02-14T19:29:08Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.