Identifying Misinformation on YouTube through Transcript Contextual
Analysis with Transformer Models
- URL: http://arxiv.org/abs/2307.12155v1
- Date: Sat, 22 Jul 2023 19:59:16 GMT
- Title: Identifying Misinformation on YouTube through Transcript Contextual
Analysis with Transformer Models
- Authors: Christos Christodoulou, Nikos Salamanos, Pantelitsa Leonidou, Michail
Papadakis, Michael Sirivianos
- Abstract summary: We introduce a novel methodology for video classification, focusing on the veracity of the content.
We employ advanced machine learning techniques like transfer learning to solve the classification challenge.
We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset.
- Score: 1.749935196721634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Misinformation on YouTube is a significant concern, necessitating robust
detection strategies. In this paper, we introduce a novel methodology for video
classification, focusing on the veracity of the content. We convert the
conventional video classification task into a text classification task by
leveraging the textual content derived from the video transcripts. We employ
advanced machine learning techniques like transfer learning to solve the
classification challenge. Our approach incorporates two forms of transfer
learning: (a) fine-tuning base transformer models such as BERT, RoBERTa, and
ELECTRA, and (b) few-shot learning using sentence-transformers MPNet and
RoBERTa-large. We apply the trained models to three datasets: (a) YouTube
Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and
(c) Fake-News dataset (a collection of articles). Including the Fake-News
dataset extended the evaluation of our approach beyond YouTube videos. Using
these datasets, we evaluated the models distinguishing valid information from
misinformation. The fine-tuned models yielded Matthews Correlation
Coefficient>0.81, accuracy>0.90, and F1 score>0.90 in two of three datasets.
Interestingly, the few-shot models outperformed the fine-tuned ones by 20% in
both Accuracy and F1 score for the YouTube Pseudoscience dataset, highlighting
the potential utility of this approach -- especially in the context of limited
training data.
Related papers
- Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data [19.210471935816273]
We propose a novel evaluation task for video-text understanding, namely retrieval from counterfactually augmented data (RCAD) and a new Feint6K dataset.
To succeed on our new evaluation task, models must derive a comprehensive understanding of the video from cross-frame reasoning.
Our approach successfully learn more discriminative action embeddings and improves results on Feint6K when applied to multiple video-text models.
arXiv Detail & Related papers (2024-07-18T01:55:48Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - CLearViD: Curriculum Learning for Video Description [3.5293199207536627]
Video description entails automatically generating coherent natural language sentences that narrate the content of a given video.
We introduce CLearViD, a transformer-based model for video description generation that leverages curriculum learning to accomplish this task.
The results on two datasets, namely ActivityNet Captions and YouCook2, show that CLearViD significantly outperforms existing state-of-the-art models in terms of both accuracy and diversity metrics.
arXiv Detail & Related papers (2023-11-08T06:20:32Z) - MisRoB{\AE}RTa: Transformers versus Misinformation [0.6091702876917281]
We propose a novel transformer-based deep neural ensemble architecture for misinformation detection.
MisRoBAERTa takes advantage of two transformers (BART & RoBERTa) to improve the classification performance.
For training and testing, we used a large real-world news articles dataset labeled with 10 classes.
arXiv Detail & Related papers (2023-04-16T12:14:38Z) - Learning a Grammar Inducer from Massive Uncurated Instructional Videos [118.7279072358029]
Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text.
We build a new model that can better learn video-span correlation without manually designed features.
Our model yields higher F1 scores than the previous state-of-the-art systems trained on in-domain data.
arXiv Detail & Related papers (2022-10-22T00:22:55Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Bi-Calibration Networks for Weakly-Supervised Video Representation
Learning [153.54638582696128]
We introduce a new design of mutual calibration between query and text to boost weakly-supervised video representation learning.
We present Bi-Calibration Networks (BCN) that novelly couples two calibrations to learn the amendment from text to query and vice versa.
BCN learnt on 3M web videos obtain superior results under linear model protocol on downstream tasks.
arXiv Detail & Related papers (2022-06-21T16:02:12Z) - MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
More Step Towards Generalization [65.09758931804478]
Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs.
A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones.
arXiv Detail & Related papers (2022-03-14T13:15:09Z) - Misinformation Detection on YouTube Using Video Captions [6.503828590815483]
This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles)
To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not.
arXiv Detail & Related papers (2021-07-02T10:02:36Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.