Cross-modal Contrastive Learning for Multimodal Fake News Detection
- URL: http://arxiv.org/abs/2302.14057v2
- Date: Fri, 11 Aug 2023 13:48:44 GMT
- Title: Cross-modal Contrastive Learning for Multimodal Fake News Detection
- Authors: Longzheng Wang, Chuang Zhang, Hongbo Xu, Yongxiu Xu, Xiaohan Xu, Siqi
Wang
- Abstract summary: COOLANT is a cross-modal contrastive learning framework for multimodal fake news detection.
A cross-modal fusion module is developed to learn the cross-modality correlations.
An attention guidance module is implemented to help effectively and interpretably aggregate the aligned unimodal representations.
- Score: 10.760000041969139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic detection of multimodal fake news has gained a widespread attention
recently. Many existing approaches seek to fuse unimodal features to produce
multimodal news representations. However, the potential of powerful cross-modal
contrastive learning methods for fake news detection has not been well
exploited. Besides, how to aggregate features from different modalities to
boost the performance of the decision-making process is still an open question.
To address that, we propose COOLANT, a cross-modal contrastive learning
framework for multimodal fake news detection, aiming to achieve more accurate
image-text alignment. To further improve the alignment precision, we leverage
an auxiliary task to soften the loss term of negative samples during the
contrast process. A cross-modal fusion module is developed to learn the
cross-modality correlations. An attention mechanism with an attention guidance
module is implemented to help effectively and interpretably aggregate the
aligned unimodal representations and the cross-modality correlations. Finally,
we evaluate the COOLANT and conduct a comparative study on two widely used
datasets, Twitter and Weibo. The experimental results demonstrate that our
COOLANT outperforms previous approaches by a large margin and achieves new
state-of-the-art results on the two datasets.
Related papers
- Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning [66.28872204574648]
Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information.
Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality.
This paper explores a new way to take advantage of cross-modal guidance without gold labels on coherency.
arXiv Detail & Related papers (2024-08-01T06:04:44Z) - Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Similarity-Aware Multimodal Prompt Learning for Fake News Detection [0.12396474483677114]
multimodal fake news detection has outperformed text-only methods.
This paper proposes a Similarity-Aware Multimodal Prompt Learning (SAMPLE) framework.
For evaluation, SAMPLE surpasses the F1 and the accuracies of previous works on two benchmark multimodal datasets.
arXiv Detail & Related papers (2023-04-09T08:10:05Z) - Multi-modal Fake News Detection on Social Media via Multi-grained
Information Fusion [21.042970740577648]
We present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection.
Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images.
The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder.
arXiv Detail & Related papers (2023-04-03T09:13:59Z) - Probing Visual-Audio Representation for Video Highlight Detection via
Hard-Pairs Guided Contrastive Learning [23.472951216815765]
Key to effective video representations is cross-modal representation learning and fine-grained feature discrimination.
In this paper, we enrich intra-modality and cross-modality relations for representation modeling.
We enlarge the discriminative power of feature embedding with a hard-pairs guided contrastive learning scheme.
arXiv Detail & Related papers (2022-06-21T07:29:37Z) - Multimodal Fake News Detection via CLIP-Guided Learning [26.093561485807832]
This paper proposes a FND-CLIP framework, i.e., a multimodal Fake News Detection network based on Contrastive Language-Image Pretraining (CLIP)
Given a targeted multimodal news, we extract the deep representations from the image and text using a ResNet-based encoder, a BERT-based encoder and two pair-wise CLIP encoders.
The multimodal feature is a concatenation of the CLIP-generated features weighted by the standardized cross-modal similarity of the two modalities.
arXiv Detail & Related papers (2022-05-28T02:43:18Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.