Understanding YouTube Communities via Subscription-based Channel
Embeddings
- URL: http://arxiv.org/abs/2010.09892v1
- Date: Mon, 19 Oct 2020 22:00:04 GMT
- Title: Understanding YouTube Communities via Subscription-based Channel
Embeddings
- Authors: Sam Clark and Anna Zaitsev
- Abstract summary: This paper presents new methods to discover and classify YouTube channels.
The methods use a self-supervised learning approach that leverages the public subscription pages of commenters.
We create a new dataset to analyze the amount of traffic going to different political content.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: YouTube is an important source of news and entertainment worldwide, but the
scale makes it challenging to study the ideas and topics being discussed on the
platform. This paper presents new methods to discover and classify YouTube
channels which enable the analysis of communities and categories on the
platform using orders of magnitude more channels than have been used in
previous studies. Instead of using channel and video data as features for
classification as other researchers have, these methods use a self-supervised
learning approach that leverages the public subscription pages of commenters.
We test the classification method on the task of predicting the political lean
of YouTube news channels and find that it outperforms the previous best model
on the task. Further experiments also show that there are important advantages
to using commenter subscriptions to discover channels. The subscription data,
along with an iterative approach, is applied to discover, to our current
understanding, the most comprehensive set of English language socio-political
YouTube channels yet to be analyzed. We experiment with predicting more fine
grained political tags for channels using a previously annotated dataset and
find that our model performs better than the average individual human reviewer
for most of the top tags. This fine grained political tag model is then applied
to the newly discovered English language socio-political channels to create a
new dataset to analyze the amount of traffic going to different political
content. The data shows that some tags, such as "Partisan Right" and
"Conspiracy", are significantly under represented when looking only at the most
popular socio-political channels. Through the use of our methods, we are able
to get a much more accurate picture of the size of these communities on
YouTube.
Related papers
- Detours for Navigating Instructional Videos [58.1645668396789]
We propose VidDetours, a video-language approach that learns to retrieve the targeted temporal segments from a large repository of how-to's.
We show our model's significant improvements over best available methods for video retrieval and question answering, with recall rates exceeding the state of the art by 35%.
arXiv Detail & Related papers (2024-01-03T16:38:56Z) - Tube2Vec: Social and Semantic Embeddings of YouTube Channels [11.321096553990824]
We create embeddings that capture social sharing behavior, video metadata, and YouTube's video recommendations.
We evaluate these embeddings using crowdsourcing and existing datasets.
We share embeddings capturing the social and semantic dimensions of 44,000 YouTube channels for the benefit of future research.
arXiv Detail & Related papers (2023-06-29T20:43:57Z) - Micro-video Tagging via Jointly Modeling Social Influence and Tag
Relation [56.23157334014773]
85.7% of micro-videos lack annotation.
Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation.
We formulate micro-video tagging as a link prediction problem in a constructed heterogeneous network.
arXiv Detail & Related papers (2023-03-15T02:13:34Z) - Characterizing Alternative Monetization Strategies on YouTube [31.029850908268013]
One of the key emerging roles of the YouTube platform is providing creators the ability to generate revenue from their content.
In this work, we focus on studying and characterizing alternative monetization strategies.
We find that external monetization is expansive and increasingly prevalent, used in 18% of all videos, with 61% of channels using one such strategy at least once.
arXiv Detail & Related papers (2022-03-18T19:48:49Z) - Classifying YouTube Comments Based on Sentiment and Type of Sentence [0.0]
We address the challenge of text extraction and classification from YouTube comments using well-known statistical measures and machine learning models.
The results show that our approach that incorporates conventional methods performs well on the classification task, validating its potential in assisting content creators increase viewer engagement on their channel.
arXiv Detail & Related papers (2021-10-31T18:08:10Z) - Scaling New Peaks: A Viewership-centric Approach to Automated Content
Curation [4.38301148531795]
We propose a viewership-driven, automated method that accommodates a range of segment identification goals.
Using satellite television viewership data as a source of ground truth for viewer interest, we apply statistical anomaly detection on a timeline of viewership metrics to identify'seed' segments of high viewer interest.
We present two case studies, on the United States Democratic Presidential Debate on 19th December 2019, and Wimbledon Women's Final 2019.
arXiv Detail & Related papers (2021-08-09T17:17:29Z) - YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking
YouTube [15.03145814947425]
YouNiverse is a large collection of channel and video metadata from English-language YouTube.
It comprises metadata from over 136k channels and 72.9M videos published between May 2005 and October 2019.
The dataset also contains a table specifying which videos a set of 449M anonymous users commented on.
arXiv Detail & Related papers (2020-12-18T17:46:47Z) - Cross-Domain Learning for Classifying Propaganda in Online Contents [67.10699378370752]
We present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic.
Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step.
arXiv Detail & Related papers (2020-11-13T10:19:13Z) - Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal.
Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z) - Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube
Thumbnails of Popular Videos [98.87558262467257]
This study explores culture preferences among countries using the thumbnails of YouTube trending videos.
Experimental results indicate that the users from similar cultures shares interests in watching similar videos on YouTube.
arXiv Detail & Related papers (2020-01-27T20:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.