Related papers: MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection

MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection

URL: http://arxiv.org/abs/2501.06764v2
Date: Fri, 24 Jan 2025 09:23:18 GMT
Title: MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection
Authors: Kaiying Yan, Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li,
Abstract summary: Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information.<n>To address this problem, we propose the MTPareto framework to optimize multimodal fusion.<n>Experiment results on FakeSV and FVC datasets show that the proposed framework outperforms baselines.
Score: 34.09249215878179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information. Significant differences in form and content of multimodal information lead to intensified optimization conflicts, hindering effective model training as well as reducing the effectiveness of existing fusion methods for bimodal. To address this problem, we propose the MTPareto framework to optimize multimodal fusion, using a Targeted Pareto(TPareto) optimization algorithm for fusion-level-specific objective learning with a certain focus. Based on the designed hierarchical fusion network, the algorithm defines three fusion levels with corresponding losses and implements all-modal-oriented Pareto gradient integration for each. This approach accomplishes superior multimodal fusion by utilizing the information obtained from intermediate fusion to provide positive effects to the entire process. Experiment results on FakeSV and FVC datasets show that the proposed framework outperforms baselines and the TPareto optimization algorithm achieves 2.40% and 1.89% accuracy improvement respectively.

Related papers

MCFNet: A Multimodal Collaborative Fusion Network for Fine-Grained Semantic Classification [2.7936465461948945]
Multimodal Collaborative Fusion Network (MCFNet) designed for fine-grained classification.<n>MCFNet architecture incorporates a regularized integrated fusion module that improves intra-modal feature representation.<n> multimodal decision classification module exploits inter-modal correlations and unimodal discriminative features.
arXiv Detail & Related papers (2025-05-29T11:42:57Z)
M$^3$amba: CLIP-driven Mamba Model for Multi-modal Remote Sensing Classification [23.322598623627222]
M$3$amba is a novel end-to-end CLIP-driven Mamba model for multi-modal fusion. We introduce CLIP-driven modality-specific adapters to achieve a comprehensive semantic understanding of different modalities. Experiments have shown that M$3$amba has an average performance improvement of at least 5.98% compared with the state-of-the-art methods.
arXiv Detail & Related papers (2025-03-09T05:06:47Z)
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection [21.185032466325737]
We introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high performance with a single training phase. Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities.
arXiv Detail & Related papers (2024-03-14T12:12:17Z)
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z)
MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection. For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features. For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z)
Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method [3.0903319879656084]
This paper introduces an innovative approach to feature alignment that revolutionizes the fusion of multimodal information. Our method employs a novel iterative process of telescopic displacement and expansion of feature representations across different modalities, culminating in a coherent unified representation within a shared feature space.
arXiv Detail & Related papers (2023-06-29T13:49:06Z)
Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process. Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z)
Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z)
IMF: Interactive Multimodal Fusion Model for Link Prediction [13.766345726697404]
We introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets.
arXiv Detail & Related papers (2023-03-20T01:20:02Z)
Late Fusion Multi-view Clustering via Global and Local Alignment Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering. We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z)
Dynamic Multimodal Fusion [8.530680502975095]
Dynamic multimodal fusion (DynMM) is a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference. Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach.
arXiv Detail & Related papers (2022-03-31T21:35:13Z)
Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
This paper proposes a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network.
arXiv Detail & Related papers (2020-11-10T09:53:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.