Related papers: MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

URL: http://arxiv.org/abs/2406.19776v1
Date: Fri, 28 Jun 2024 09:24:52 GMT
Title: MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection
Authors: Hongzhen Lv, Wenzhong Yang, Fuyuan Wei, Jiaren Peng, Haokun Geng,
Abstract summary: We propose a new dynamic fusion framework dubbed MDF for fake news detection. Our model consists of two main components: (1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image.
Score: 0.41942958779358674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in turn leads to intra- and inter-modal uncertainty. In addition, although many methods based on simply splicing two modalities have achieved more prominent results, these methods ignore the drawback of holding fixed weights across modalities, which would lead to some features with higher impact factors being ignored. To alleviate the above problems, we propose a new dynamic fusion framework dubbed MDF for fake news detection. As far as we know, it is the first attempt of dynamic fusion framework in the field of fake news detection. Specifically, our model consists of two main components:(1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image. In order to present better results for the dynamic fusion framework, we use GAT for inter-modal uncertainty and weight modeling before DFN. Extensive experiments on two benchmark datasets demonstrate the effectiveness and superior performance of the MDF framework. We also conducted a systematic ablation study to gain insight into our motivation and architectural design. We make our model publicly available to:https://github.com/CoisiniStar/MDF

Related papers

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
Multimodal Fake News Detection: MFND Dataset and Shallow-Deep Multitask Learning [22.494473679788396]
Multimodal news contains a wealth of information and is easily affected by deepfake modeling attacks.<n>To combat the latest image and text generation methods, we present a new Multimodal Fake News Detection dataset (MFND)<n>MFND contains 11 manipulated types, designed to detect and localize highly authentic fake news.
arXiv Detail & Related papers (2025-05-11T00:26:13Z)
X-Fusion: Introducing New Modality to Frozen Large Language Models [82.3508830643655]
We propose X-Fusion, a framework that extends pretrained Large Language Models for multimodal tasks. X-Fusion employs a dual-tower design with modality-specific weights, keeping the LLM's parameters frozen while integrating vision-specific information for both understanding and generation. Our experiments demonstrate that X-Fusion consistently outperforms alternative architectures on both image-to-text and text-to-image tasks.
arXiv Detail & Related papers (2025-04-29T17:59:45Z)
Rethinking Multi-Modal Object Detection from the Perspective of Mono-Modality Feature Learning [18.268054258939213]
Multi-Modal Object Detection (MMOD) has been widely applied in various applications.<n>This paper introduces linear probing evaluation to the multi-modal detectors.<n>We construct a novel framework called M$2$D-LIF, which consists of the Mono-Modality Distillation (M$2$D) method and the Local Illumination-aware Fusion (LIF) module.
arXiv Detail & Related papers (2025-03-14T18:15:53Z)
Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection [5.251333057002555]
We propose a novel and flexible triple path enhanced neural architecture search model MUSE. MUSE includes two dynamic paths for detecting partial-modality contained fake news and a static path for exploiting potential multimodal correlations. Experimental results show that MUSE achieves stable performance improvement over the baselines.
arXiv Detail & Related papers (2025-01-24T12:35:36Z)
HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection [4.908389661988192]
HFMF is a comprehensive two-stage deepfake detection framework. It integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism. We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks.
arXiv Detail & Related papers (2025-01-10T00:20:29Z)
On the Multi-modal Vulnerability of Diffusion Models [56.08923332178462]
We propose MMP-Attack to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt. Our goal is to induce diffusion models to generate a specific object while simultaneously eliminating the original object.
arXiv Detail & Related papers (2024-02-02T12:39:49Z)
A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis. PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information. EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z)
Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4) DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content. We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z)
Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture [3.9850392954445875]
In our model, we use BERT + BiLSTM as new feature extractor to capture the long-distance dependencies in sentences. To remove redundant information, CNN and CBAM attention are added after splicing text features and picture features. The experimental results show that our model achieves a sound effect, similar to the advanced model.
arXiv Detail & Related papers (2023-03-26T12:34:01Z)
Cross-modal Contrastive Learning for Multimodal Fake News Detection [10.760000041969139]
COOLANT is a cross-modal contrastive learning framework for multimodal fake news detection. A cross-modal fusion module is developed to learn the cross-modality correlations. An attention guidance module is implemented to help effectively and interpretably aggregate the aligned unimodal representations.
arXiv Detail & Related papers (2023-02-25T10:12:34Z)
Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification [5.552606716659022]
Multi-modal fact verification has become an important but challenging issue on social media. In this paper, we propose the Pre-CoFactv2 framework for modeling fine-grained text and input embeddings with lightening parameters. We show that Pre-CoFactv2 outperforms Pre-CoFact by a large margin and achieved new state-of-the-art results at the Factify challenge at AAAI 2023.
arXiv Detail & Related papers (2023-02-12T18:08:54Z)
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations. Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived. We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z)
Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.