A Novel BERT-based Classifier to Detect Political Leaning of YouTube Videos based on their Titles
- URL: http://arxiv.org/abs/2404.04261v1
- Date: Fri, 16 Feb 2024 14:44:30 GMT
- Title: A Novel BERT-based Classifier to Detect Political Leaning of YouTube Videos based on their Titles
- Authors: Nouar AlDahoul, Talal Rahwan, Yasir Zaki,
- Abstract summary: A quarter of US adults regularly get their news from YouTube.
We propose a novel classifier to classify YouTube videos based on their titles into six categories, namely: Far Left, Left, Center, Anti-Woke, Right, and Far Right.
For the vast majority of cases, the predicted political leaning matches that of the news agency.
- Score: 1.6647208383676708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A quarter of US adults regularly get their news from YouTube. Yet, despite the massive political content available on the platform, to date no classifier has been proposed to identify the political leaning of YouTube videos. To fill this gap, we propose a novel classifier based on Bert -- a language model from Google -- to classify YouTube videos merely based on their titles into six categories, namely: Far Left, Left, Center, Anti-Woke, Right, and Far Right. We used a public dataset of 10 million YouTube video titles (under various categories) to train and validate the proposed classifier. We compare the classifier against several alternatives that we trained on the same dataset, revealing that our classifier achieves the highest accuracy (75%) and the highest F1 score (77%). To further validate the classification performance, we collect videos from YouTube channels of numerous prominent news agencies, such as Fox News and New York Times, which have widely known political leanings, and apply our classifier to their video titles. For the vast majority of cases, the predicted political leaning matches that of the news agency.
Related papers
- Computational Assessment of Hyperpartisanship in News Titles [55.92100606666497]
We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection.
Overall the Right media tends to use proportionally more hyperpartisan titles.
We identify three major topics including foreign issues, political systems, and societal issues that are suggestive of hyperpartisanship in news titles.
arXiv Detail & Related papers (2023-01-16T05:56:58Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - Navigating an Ocean of Video Data: Deep Learning for Humpback Whale
Classification in YouTube Videos [0.0]
We use a CNNRNN architecture pretrained on the ImageNet dataset for classification of YouTube videos as relevant or irrelevant.
We achieve an average 85.7% accuracy, and 84.7% (irrelevant)/ 86.6% (relevant) F1 scores using five-fold cross validation.
We show that deep learning can be used as a time-efficient step to make social media a viable source of image and video data for biodiversity assessments.
arXiv Detail & Related papers (2022-12-01T19:19:46Z) - Top Gear or Black Mirror: Inferring Political Leaning From Non-Political
Content [8.435739379764408]
Polarization and echo chambers are often studied in the context of explicitly political events such as elections.
Political polarization in non-political contexts is often unknown.
Political leaning is known to correlate with many lifestyle choices leading to stereotypes such as the "latte-drinking liberal"
arXiv Detail & Related papers (2022-08-11T06:41:23Z) - Misinformation Detection on YouTube Using Video Captions [6.503828590815483]
This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles)
To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not.
arXiv Detail & Related papers (2021-07-02T10:02:36Z) - Examining the consumption of radical content on YouTube [1.2820564400223966]
Recently, YouTube's scale has fueled concerns that YouTube users are being radicalized via a combination of biased recommendations and ostensibly apolitical anti-woke channels.
Here we test this hypothesis using a representative panel of more than 300,000 Americans and their individual-level browsing behavior.
We find no evidence that engagement with far-right content is caused by YouTube recommendations systematically, nor do we find clear evidence that anti-woke channels serve as a gateway to the far right.
arXiv Detail & Related papers (2020-11-25T16:00:20Z) - Cross-Domain Learning for Classifying Propaganda in Online Contents [67.10699378370752]
We present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic.
Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step.
arXiv Detail & Related papers (2020-11-13T10:19:13Z) - Understanding YouTube Communities via Subscription-based Channel
Embeddings [0.0]
This paper presents new methods to discover and classify YouTube channels.
The methods use a self-supervised learning approach that leverages the public subscription pages of commenters.
We create a new dataset to analyze the amount of traffic going to different political content.
arXiv Detail & Related papers (2020-10-19T22:00:04Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal.
Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.