Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection
- URL: http://arxiv.org/abs/2501.14455v2
- Date: Thu, 06 Feb 2025 03:29:43 GMT
- Title: Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection
- Authors: Bo Xu, Qiujie Xie, Jiahui Zhou, Linlin Zong,
- Abstract summary: We propose a novel and flexible triple path enhanced neural architecture search model MUSE.<n>MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations.<n> Experimental results show that MUSE achieves stable performance improvement over the baselines.
- Score: 5.251333057002555
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multimodal fake news detection has become one of the most crucial issues on social media platforms. Although existing methods have achieved advanced performance, two main challenges persist: (1) Under-performed multimodal news information fusion due to model architecture solidification, and (2) weak generalization ability on partial-modality contained fake news. To meet these challenges, we propose a novel and flexible triple path enhanced neural architecture search model MUSE. MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations. Experimental results show that MUSE achieves stable performance improvement over the baselines.
Related papers
- Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention [16.607714608483164]
The rapid growth of social media has led to the widespread dissemination of fake news across multiple content forms.
Traditional unimodal detection methods fall short in addressing complex cross-modal manipulations.
We propose the Causal Intervention-based Multimodal Decon Detection framework.
arXiv Detail & Related papers (2025-04-12T09:57:43Z) - Exploring Modality Disruption in Multimodal Fake News Detection [16.607714608483164]
We propose a multimodal fake news detection framework, FND-MoE, to address the issue of modality disruption.
FND-MoE significantly outperforms state-of-the-art methods, with accuracy improvements of 3.45% and 3.71% on the respective datasets.
arXiv Detail & Related papers (2025-04-12T09:39:29Z) - MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection [0.41942958779358674]
We propose a new dynamic fusion framework dubbed MDF for fake news detection.
Our model consists of two main components: (1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image.
arXiv Detail & Related papers (2024-06-28T09:24:52Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Inconsistent Matters: A Knowledge-guided Dual-consistency Network for
Multi-modal Rumor Detection [53.48346699224921]
A novel Knowledge-guided Dualconsistency Network is proposed to detect rumors with multimedia contents.
It uses two consistency detectionworks to capture the inconsistency at the cross-modal level and the content-knowledge level simultaneously.
It also enables robust multi-modal representation learning under different missing visual modality conditions.
arXiv Detail & Related papers (2023-06-03T15:32:20Z) - Multi-modal Fake News Detection on Social Media via Multi-grained
Information Fusion [21.042970740577648]
We present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection.
Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images.
The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder.
arXiv Detail & Related papers (2023-04-03T09:13:59Z) - Cross-modal Contrastive Learning for Multimodal Fake News Detection [10.760000041969139]
COOLANT is a cross-modal contrastive learning framework for multimodal fake news detection.
A cross-modal fusion module is developed to learn the cross-modality correlations.
An attention guidance module is implemented to help effectively and interpretably aggregate the aligned unimodal representations.
arXiv Detail & Related papers (2023-02-25T10:12:34Z) - Improving Multimodal fusion via Mutual Dependency Maximisation [5.73995120847626]
Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic.
In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities.
We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models.
arXiv Detail & Related papers (2021-08-31T06:26:26Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.