Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection
- URL: http://arxiv.org/abs/2501.14455v2
- Date: Thu, 06 Feb 2025 03:29:43 GMT
- Title: Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection
- Authors: Bo Xu, Qiujie Xie, Jiahui Zhou, Linlin Zong,
- Abstract summary: We propose a novel and flexible triple path enhanced neural architecture search model MUSE.
MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations.
Experimental results show that MUSE achieves stable performance improvement over the baselines.
- Score: 5.251333057002555
- License:
- Abstract: Multimodal fake news detection has become one of the most crucial issues on social media platforms. Although existing methods have achieved advanced performance, two main challenges persist: (1) Under-performed multimodal news information fusion due to model architecture solidification, and (2) weak generalization ability on partial-modality contained fake news. To meet these challenges, we propose a novel and flexible triple path enhanced neural architecture search model MUSE. MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations. Experimental results show that MUSE achieves stable performance improvement over the baselines.
Related papers
- Modality Interactive Mixture-of-Experts for Fake News Detection [13.508494216511094]
We present Modality Interactive Mixture-of-Experts for Fake News Detection (MIMoE-FND)
MIMoE-FND is a novel hierarchical Mixture-of-Experts framework designed to enhance multimodal fake news detection.
We evaluate our approach on three real-world benchmarks spanning two languages, demonstrating its superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-21T16:49:00Z) - MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection [0.41942958779358674]
We propose a new dynamic fusion framework dubbed MDF for fake news detection.
Our model consists of two main components: (1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image.
arXiv Detail & Related papers (2024-06-28T09:24:52Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Inconsistent Matters: A Knowledge-guided Dual-consistency Network for
Multi-modal Rumor Detection [53.48346699224921]
A novel Knowledge-guided Dualconsistency Network is proposed to detect rumors with multimedia contents.
It uses two consistency detectionworks to capture the inconsistency at the cross-modal level and the content-knowledge level simultaneously.
It also enables robust multi-modal representation learning under different missing visual modality conditions.
arXiv Detail & Related papers (2023-06-03T15:32:20Z) - Multi-modal Fake News Detection on Social Media via Multi-grained
Information Fusion [21.042970740577648]
We present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection.
Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images.
The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder.
arXiv Detail & Related papers (2023-04-03T09:13:59Z) - Cross-modal Contrastive Learning for Multimodal Fake News Detection [10.760000041969139]
COOLANT is a cross-modal contrastive learning framework for multimodal fake news detection.
A cross-modal fusion module is developed to learn the cross-modality correlations.
An attention guidance module is implemented to help effectively and interpretably aggregate the aligned unimodal representations.
arXiv Detail & Related papers (2023-02-25T10:12:34Z) - Improving Multimodal fusion via Mutual Dependency Maximisation [5.73995120847626]
Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic.
In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities.
We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models.
arXiv Detail & Related papers (2021-08-31T06:26:26Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.