Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection
- URL: http://arxiv.org/abs/2407.19493v3
- Date: Fri, 27 Dec 2024 10:34:15 GMT
- Title: Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection
- Authors: Yihao Wang, Lizhi Chen, Zhong Qian, Peifeng Li,
- Abstract summary: multimodal fake news detection has recently garnered increased attention.<n>We construct a dataset named Official-NV, comprising officially published news videos.<n>We also propose a new baseline model called OFNVD, which captures key information from multimodal features.
- Score: 9.48705939124715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: News media, especially video news media, have penetrated into every aspect of daily life, which also brings the risk of fake news. Therefore, multimodal fake news detection has recently garnered increased attention. However, the existing datasets are comprised of user-uploaded videos and contain an excess amounts of superfluous data, which introduces noise into the model training process. To address this issue, we construct a dataset named Official-NV, comprising officially published news videos. The crawl officially published videos are augmented through the use of LLMs-based generation and manual verification, thereby expanding the dataset. We also propose a new baseline model called OFNVD, which captures key information from multimodal features through a GLU attention mechanism and performs feature enhancement and modal aggregation via a cross-modal Transformer. Benchmarking the dataset and baselines demonstrates the effectiveness of our model in multimodal news detection.
Related papers
- Exploring Modality Disruption in Multimodal Fake News Detection [16.607714608483164]
We propose a multimodal fake news detection framework, FND-MoE, to address the issue of modality disruption.
FND-MoE significantly outperforms state-of-the-art methods, with accuracy improvements of 3.45% and 3.71% on the respective datasets.
arXiv Detail & Related papers (2025-04-12T09:39:29Z) - FMNV: A Dataset of Media-Published News Videos for Fake News Detection [10.36393083923778]
We construct FMNV, a novel dataset exclusively composed of news videos published by media organizations.
We employ Large Language Models (LLMs) to automatically generate content by manipulating authentic media-published news videos.
We propose FMNVD, a baseline model featuring a dual-stream architecture integrating CLIP and Faster R-CNN for video feature extraction.
arXiv Detail & Related papers (2025-04-10T12:16:32Z) - VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos [14.551693267228345]
This paper presents a novel fake news detection method based on multimodal information, designed to identify misinformation through a multi-level analysis of video content.
The proposed framework successfully integrates multimodal features within videos, significantly enhancing the accuracy and reliability of fake news detection.
arXiv Detail & Related papers (2024-11-15T08:20:26Z) - Video Instruction Tuning With Synthetic Data [84.64519990333406]
We create a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K.
This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA.
By training on this dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM.
arXiv Detail & Related papers (2024-10-03T17:36:49Z) - VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Multi-modal News Understanding with Professionally Labelled Videos
(ReutersViLNews) [25.78619140103048]
We present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset.
The dataset focuses on high-level video-language understanding with an emphasis on long-form news.
The results suggest that news-oriented videos are a substantial challenge for current video-language understanding algorithms.
arXiv Detail & Related papers (2024-01-23T00:42:04Z) - Not all Fake News is Written: A Dataset and Analysis of Misleading Video
Headlines [6.939987423356328]
We present a dataset that consists of videos and whether annotators believe the headline is representative of the video's contents.
After collecting and annotating this dataset, we analyze multimodal baselines for detecting misleading headlines.
Our annotation process also focuses on why annotators view a video as misleading, allowing us to better understand the interplay of annotators' background and the content of the videos.
arXiv Detail & Related papers (2023-10-20T23:47:01Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - MultiVENT: Multilingual Videos of Events with Aligned Natural Text [29.266266741468055]
MultiVENT is a dataset of multilingual, event-centric videos grounded in text documents across five target languages.
We analyze the state of online news videos and how they can be leveraged to build robust, factually accurate models.
arXiv Detail & Related papers (2023-07-06T17:29:34Z) - Unsupervised Domain-agnostic Fake News Detection using Multi-modal Weak
Signals [19.22829945777267]
This work proposes an effective framework for unsupervised fake news detection, which first embeds the knowledge available in four modalities in news records.
Also, we propose a novel technique to construct news datasets minimizing the latent biases in existing news datasets.
We trained the proposed unsupervised framework using LUND-COVID to exploit the potential of large datasets.
arXiv Detail & Related papers (2023-05-18T23:49:31Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - MDMMT: Multidomain Multimodal Transformer for Video Retrieval [63.872634680339644]
We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks.
We show that training on different datasets can improve test results of each other.
arXiv Detail & Related papers (2021-03-19T09:16:39Z) - VMSMO: Learning to Generate Multimodal Summary for Video-based News
Articles [63.32111010686954]
We propose the task of Video-based Multimodal Summarization with Multimodal Output (VMSMO)
The main challenge in this task is to jointly model the temporal dependency of video with semantic meaning of article.
We propose a Dual-Interaction-based Multimodal Summarizer (DIMS), consisting of a dual interaction module and multimodal generator.
arXiv Detail & Related papers (2020-10-12T02:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.