Improving Generalization for Multimodal Fake News Detection
- URL: http://arxiv.org/abs/2305.18599v1
- Date: Mon, 29 May 2023 20:32:22 GMT
- Title: Improving Generalization for Multimodal Fake News Detection
- Authors: Sahar Tahmasebi, Sherzod Hakimov, Ralph Ewerth, Eric M\"uller-Budack
- Abstract summary: State-of-the-art approaches are usually trained on datasets of smaller size or with a limited set of specific topics.
We propose three models that adopt and fine-tune state-of-the-art multimodal transformers for multimodal fake news detection.
- Score: 8.595270610973586
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The increasing proliferation of misinformation and its alarming impact have
motivated both industry and academia to develop approaches for fake news
detection. However, state-of-the-art approaches are usually trained on datasets
of smaller size or with a limited set of specific topics. As a consequence,
these models lack generalization capabilities and are not applicable to
real-world data. In this paper, we propose three models that adopt and
fine-tune state-of-the-art multimodal transformers for multimodal fake news
detection. We conduct an in-depth analysis by manipulating the input data aimed
to explore models performance in realistic use cases on social media. Our study
across multiple models demonstrates that these systems suffer significant
performance drops against manipulated data. To reduce the bias and improve
model generalization, we suggest training data augmentation to conduct more
meaningful experiments for fake news detection on social media. The proposed
data augmentation techniques enable models to generalize better and yield
improved state-of-the-art results.
Related papers
- How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models [95.44559524735308]
Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content.
We test the limits of improving foundation model performance without continual updating through an initial study of knowledge transfer.
Our results on two recent multi-modal fact-checking benchmarks, Mocheg and Fakeddit, indicate that knowledge transfer strategies can improve Fakeddit performance over the state-of-the-art by up to 1.7% and Mocheg performance by up to 2.9%.
arXiv Detail & Related papers (2024-06-29T08:39:07Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Interpretable Fake News Detection with Topic and Deep Variational Models [2.15242029196761]
We focus on fake news detection using interpretable features and methods.
We have developed a deep probabilistic model that integrates a dense representation of textual news.
Our model achieves comparable performance to state-of-the-art competing models.
arXiv Detail & Related papers (2022-09-04T05:31:00Z) - Exploring Generalizability of Fine-Tuned Models for Fake News Detection [3.210653757360955]
Covid-19 pandemic has caused a dramatic and parallel rise in dangerous misinformation, denoted an infodemic' by the CDC and WHO.
Misinformation tied to the Covid-19 infodemic changes continuously; this can lead to performance degradation of fine-tuned models due to concept drift.
In this paper, we explore generalizability of pre-trained and fine-tuned fake news detectors across 9 fake news datasets.
arXiv Detail & Related papers (2022-05-15T00:00:49Z) - Unified Fake News Detection using Transfer Learning of Bidirectional
Encoder Representation from Transformers model [0.0]
This paper attempts to develop a unified model by combining publicly available datasets to detect fake news samples effectively.
Most of the prior models were designed and validated on individual datasets separately.
arXiv Detail & Related papers (2022-02-03T23:23:26Z) - Multimodal Emergent Fake News Detection via Meta Neural Process Networks [36.52739834391597]
We propose an end-to-end fake news detection framework named MetaFEND.
Specifically, the proposed model integrates meta-learning and neural process methods together.
Extensive experiments are conducted on multimedia datasets collected from Twitter and Weibo.
arXiv Detail & Related papers (2021-06-22T21:21:29Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.