Related papers: MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models

MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models

URL: http://arxiv.org/abs/2508.05684v1
Date: Tue, 05 Aug 2025 21:27:13 GMT
Title: MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models
Authors: Junhao He, Tianyu Liu, Jingyuan Zhao, Benjamin Turner,
Abstract summary: The proliferation of multi-modal fake news on social media poses a significant threat to public trust and social stability.<n>Traditional detection methods, primarily text-based, often fall short due to the deceptive interplay between misleading text and images.<n>This paper introduces MM-FusionNet, an innovative framework leveraging LVLMs for robust multi-modal fake news detection.
Score: 6.50724643327177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of multi-modal fake news on social media poses a significant threat to public trust and social stability. Traditional detection methods, primarily text-based, often fall short due to the deceptive interplay between misleading text and images. While Large Vision-Language Models (LVLMs) offer promising avenues for multi-modal understanding, effectively fusing diverse modal information, especially when their importance is imbalanced or contradictory, remains a critical challenge. This paper introduces MM-FusionNet, an innovative framework leveraging LVLMs for robust multi-modal fake news detection. Our core contribution is the Context-Aware Dynamic Fusion Module (CADFM), which employs bi-directional cross-modal attention and a novel dynamic modal gating network. This mechanism adaptively learns and assigns importance weights to textual and visual features based on their contextual relevance, enabling intelligent prioritization of information. Evaluated on the large-scale Multi-modal Fake News Dataset (LMFND) comprising 80,000 samples, MM-FusionNet achieves a state-of-the-art F1-score of 0.938, surpassing existing multi-modal baselines by approximately 0.5% and significantly outperforming single-modal approaches. Further analysis demonstrates the model's dynamic weighting capabilities, its robustness to modality perturbations, and performance remarkably close to human-level, underscoring its practical efficacy and interpretability for real-world fake news detection.

Related papers

When Language Overrules: Revealing Text Dominance in Multimodal Large Language Models [10.106066580331584]
We conduct the first systematic investigation of text dominance across diverse data modalities, including images, videos, audio, time-series, and graphs.<n>Our in-depth analysis identifies three underlying causes: attention dilution from severe token redundancy in non-textual modalities, the influence of fusion architecture design, and task formulations that implicitly favor textual inputs.
arXiv Detail & Related papers (2025-08-14T11:44:52Z)
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection [6.377687638891252]
Multimodal fake news detection has garnered significant attention due to its profound implications for social security.<n>We propose a novel framework that explores intrinsic discriminative features based on news content to advance fake news detection.
arXiv Detail & Related papers (2025-02-03T07:58:22Z)
Modality Interactive Mixture-of-Experts for Fake News Detection [13.508494216511094]
We present Modality Interactive Mixture-of-Experts for Fake News Detection (MIMoE-FND)<n>MIMoE-FND is a novel hierarchical Mixture-of-Experts framework designed to enhance multimodal fake news detection.<n>We evaluate our approach on three real-world benchmarks spanning two languages, demonstrating its superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-21T16:49:00Z)
Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z)
MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection [0.41942958779358674]
We propose a new dynamic fusion framework dubbed MDF for fake news detection. Our model consists of two main components: (1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image.
arXiv Detail & Related papers (2024-06-28T09:24:52Z)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.<n>Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.<n>Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z)
MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples [63.78384552789171]
This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel multi-modal fine-tuning paradigm. We propose the Multi-Modal Hub (M-Hub), a unified module that captures various multi-modal features according to different inputs and objectives. Based on M-Hub, MMICT enables MM-LLMs to learn from in-context visual-guided textual features and subsequently generate outputs conditioned on the textual-guided visual features.
arXiv Detail & Related papers (2023-12-11T13:11:04Z)
Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process. Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z)
Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z)
Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion [21.042970740577648]
We present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection. Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images. The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder.
arXiv Detail & Related papers (2023-04-03T09:13:59Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.