Multimodal Fake News Detection via CLIP-Guided Learning
- URL: http://arxiv.org/abs/2205.14304v1
- Date: Sat, 28 May 2022 02:43:18 GMT
- Title: Multimodal Fake News Detection via CLIP-Guided Learning
- Authors: Yangming Zhou, Qichao Ying, Zhenxing Qian, Sheng Li and Xinpeng Zhang
- Abstract summary: This paper proposes a FND-CLIP framework, i.e., a multimodal Fake News Detection network based on Contrastive Language-Image Pretraining (CLIP)
Given a targeted multimodal news, we extract the deep representations from the image and text using a ResNet-based encoder, a BERT-based encoder and two pair-wise CLIP encoders.
The multimodal feature is a concatenation of the CLIP-generated features weighted by the standardized cross-modal similarity of the two modalities.
- Score: 26.093561485807832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal fake news detection has attracted many research interests in
social forensics. Many existing approaches introduce tailored attention
mechanisms to guide the fusion of unimodal features. However, how the
similarity of these features is calculated and how it will affect the
decision-making process in FND are still open questions. Besides, the potential
of pretrained multi-modal feature learning models in fake news detection has
not been well exploited. This paper proposes a FND-CLIP framework, i.e., a
multimodal Fake News Detection network based on Contrastive Language-Image
Pretraining (CLIP). Given a targeted multimodal news, we extract the deep
representations from the image and text using a ResNet-based encoder, a
BERT-based encoder and two pair-wise CLIP encoders. The multimodal feature is a
concatenation of the CLIP-generated features weighted by the standardized
cross-modal similarity of the two modalities. The extracted features are
further processed for redundancy reduction before feeding them into the final
classifier. We introduce a modality-wise attention module to adaptively
reweight and aggregate the features. We have conducted extensive experiments on
typical fake news datasets. The results indicate that the proposed framework
has a better capability in mining crucial features for fake news detection. The
proposed FND-CLIP can achieve better performances than previous works, i.e.,
0.7\%, 6.8\% and 1.3\% improvements in overall accuracy on Weibo, Politifact
and Gossipcop, respectively. Besides, we justify that CLIP-based learning can
allow better flexibility on multimodal feature selection.
Related papers
- Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.
We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.
We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection [0.21990652930491858]
Few-shot learning is critical for detecting fake news in its early stages.
This paper presents a multimodal fake news detection model which augments multimodal features using unimodal features.
The proposed CMA achieves SOTA results over three benchmark datasets.
arXiv Detail & Related papers (2024-07-16T09:32:11Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Similarity-Aware Multimodal Prompt Learning for Fake News Detection [0.12396474483677114]
multimodal fake news detection has outperformed text-only methods.
This paper proposes a Similarity-Aware Multimodal Prompt Learning (SAMPLE) framework.
For evaluation, SAMPLE surpasses the F1 and the accuracies of previous works on two benchmark multimodal datasets.
arXiv Detail & Related papers (2023-04-09T08:10:05Z) - Multi-modal Fake News Detection on Social Media via Multi-grained
Information Fusion [21.042970740577648]
We present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection.
Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images.
The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder.
arXiv Detail & Related papers (2023-04-03T09:13:59Z) - Cross-modal Contrastive Learning for Multimodal Fake News Detection [10.760000041969139]
COOLANT is a cross-modal contrastive learning framework for multimodal fake news detection.
A cross-modal fusion module is developed to learn the cross-modality correlations.
An attention guidance module is implemented to help effectively and interpretably aggregate the aligned unimodal representations.
arXiv Detail & Related papers (2023-02-25T10:12:34Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - MaPLe: Multi-modal Prompt Learning [54.96069171726668]
We propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes.
arXiv Detail & Related papers (2022-10-06T17:59:56Z) - Multimodal Fake News Detection with Adaptive Unimodal Representation
Aggregation [28.564442206829625]
AURA is a multimodal fake news detection network with adaptive unimodal representation aggregation.
We perform coarse-level fake news detection and cross-modal cosistency learning according to the unimodal and multimodal representations.
Experiments on Weibo and Gossipcop prove that AURA can successfully beat several state-of-the-art FND schemes.
arXiv Detail & Related papers (2022-06-12T14:06:55Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.