Related papers: VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos

VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos

URL: http://arxiv.org/abs/2411.10032v1
Date: Fri, 15 Nov 2024 08:20:26 GMT
Title: VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos
Authors: Weihao Zhong, Yinhao Xiao, Minghui Xu, Xiuzhen Cheng,
Abstract summary: This paper presents a novel fake news detection method based on multimodal information, designed to identify misinformation through a multi-level analysis of video content. The proposed framework successfully integrates multimodal features within videos, significantly enhancing the accuracy and reliability of fake news detection.
Score: 14.551693267228345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Short video platforms have become important channels for news dissemination, offering a highly engaging and immediate way for users to access current events and share information. However, these platforms have also emerged as significant conduits for the rapid spread of misinformation, as fake news and rumors can leverage the visual appeal and wide reach of short videos to circulate extensively among audiences. Existing fake news detection methods mainly rely on single-modal information, such as text or images, or apply only basic fusion techniques, limiting their ability to handle the complex, multi-layered information inherent in short videos. To address these limitations, this paper presents a novel fake news detection method based on multimodal information, designed to identify misinformation through a multi-level analysis of video content. This approach effectively utilizes different modal representations to generate a unified textual description, which is then fed into a large language model for comprehensive evaluation. The proposed framework successfully integrates multimodal features within videos, significantly enhancing the accuracy and reliability of fake news detection. Experimental results demonstrate that the proposed approach outperforms existing models in terms of accuracy, robustness, and utilization of multimodal information, achieving an accuracy of 90.93%, which is significantly higher than the best baseline model (SV-FEND) at 81.05%. Furthermore, case studies provide additional evidence of the effectiveness of the approach in accurately distinguishing between fake news, debunking content, and real incidents, highlighting its reliability and robustness in real-world applications.

Related papers

Exploring Modality Disruption in Multimodal Fake News Detection [16.607714608483164]
We propose a multimodal fake news detection framework, FND-MoE, to address the issue of modality disruption. FND-MoE significantly outperforms state-of-the-art methods, with accuracy improvements of 3.45% and 3.71% on the respective datasets.
arXiv Detail & Related papers (2025-04-12T09:39:29Z)
FMNV: A Dataset of Media-Published News Videos for Fake News Detection [10.36393083923778]
We construct FMNV, a novel dataset exclusively composed of news videos published by media organizations. We employ Large Language Models (LLMs) to automatically generate content by manipulating authentic media-published news videos. We propose FMNVD, a baseline model featuring a dual-stream architecture integrating CLIP and Faster R-CNN for video feature extraction.
arXiv Detail & Related papers (2025-04-10T12:16:32Z)
External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection [10.575512607941839]
ERIC-FND is an external reliable information-enhanced multimodal contrastive learning framework for fake news detection. Experiments are done on two commonly used datasets in different languages, X (Twitter) and Weibo.
arXiv Detail & Related papers (2025-03-05T02:07:38Z)
A Self-Learning Multimodal Approach for Fake News Detection [35.98977478616019]
We introduce a self-learning multimodal model for fake news classification. The model leverages contrastive learning, a robust method for feature extraction that operates without requiring labeled data. Our experimental results on a public dataset demonstrate that the proposed model outperforms several state-of-the-art classification approaches.
arXiv Detail & Related papers (2024-12-08T07:41:44Z)
Dynamic Analysis and Adaptive Discriminator for Fake News Detection [59.41431561403343]
We propose a Dynamic Analysis and Adaptive Discriminator (DAAD) approach for fake news detection. For knowledge-based methods, we introduce the Monte Carlo Tree Search algorithm to leverage the self-reflective capabilities of large language models. For semantic-based methods, we define four typical deceit patterns to reveal the mechanisms behind fake news creation.
arXiv Detail & Related papers (2024-08-20T14:13:54Z)
Detect, Investigate, Judge and Determine: A Knowledge-guided Framework for Few-shot Fake News Detection [50.079690200471454]
Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. We propose a Dual-perspective Knowledge-guided Fake News Detection (DKFND) model, designed to enhance LLMs from both inside and outside perspectives.
arXiv Detail & Related papers (2024-07-12T03:15:01Z)
FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection [54.37159298632628]
FineFake is a multi-domain knowledge-enhanced benchmark for fake news detection. FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms. The entire FineFake project is publicly accessible as an open-source repository.
arXiv Detail & Related papers (2024-03-30T14:39:09Z)
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality. We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z)
Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4) DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content. We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z)
Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z)
Multimodal Short Video Rumor Detection System Based on Contrastive Learning [3.4192832062683842]
Short video platforms in China have gradually evolved into fertile grounds for the proliferation of fake news. distinguishing short video rumors poses a significant challenge due to the substantial amount of information and shared features. Our research group proposes a methodology encompassing multimodal feature fusion and the integration of external knowledge.
arXiv Detail & Related papers (2023-04-17T16:07:00Z)
Similarity-Aware Multimodal Prompt Learning for Fake News Detection [0.12396474483677114]
multimodal fake news detection has outperformed text-only methods. This paper proposes a Similarity-Aware Multimodal Prompt Learning (SAMPLE) framework. For evaluation, SAMPLE surpasses the F1 and the accuracies of previous works on two benchmark multimodal datasets.
arXiv Detail & Related papers (2023-04-09T08:10:05Z)
Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion [21.042970740577648]
We present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection. Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images. The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder.
arXiv Detail & Related papers (2023-04-03T09:13:59Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
Interpretable Fake News Detection with Topic and Deep Variational Models [2.15242029196761]
We focus on fake news detection using interpretable features and methods. We have developed a deep probabilistic model that integrates a dense representation of textual news. Our model achieves comparable performance to state-of-the-art competing models.
arXiv Detail & Related papers (2022-09-04T05:31:00Z)
A Multi-Policy Framework for Deep Learning-Based Fake News Detection [0.31498833540989407]
This work introduces Multi-Policy Statement Checker (MPSC), a framework that automates fake news detection. MPSC uses deep learning techniques to analyze a statement itself and its related news articles, predicting whether it is seemingly credible or suspicious.
arXiv Detail & Related papers (2022-06-01T21:25:21Z)
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles [63.32111010686954]
We propose the task of Video-based Multimodal Summarization with Multimodal Output (VMSMO) The main challenge in this task is to jointly model the temporal dependency of video with semantic meaning of article. We propose a Dual-Interaction-based Multimodal Summarizer (DIMS), consisting of a dual interaction module and multimodal generator.
arXiv Detail & Related papers (2020-10-12T02:19:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.