YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking
YouTube
- URL: http://arxiv.org/abs/2012.10378v2
- Date: Thu, 8 Apr 2021 14:23:39 GMT
- Title: YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking
YouTube
- Authors: Manoel Horta Ribeiro, Robert West
- Abstract summary: YouNiverse is a large collection of channel and video metadata from English-language YouTube.
It comprises metadata from over 136k channels and 72.9M videos published between May 2005 and October 2019.
The dataset also contains a table specifying which videos a set of 449M anonymous users commented on.
- Score: 15.03145814947425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: YouTube plays a key role in entertaining and informing people around the
globe. However, studying the platform is difficult due to the lack of randomly
sampled data and of systematic ways to query the platform's colossal catalog.
In this paper, we present YouNiverse, a large collection of channel and video
metadata from English-language YouTube. YouNiverse comprises metadata from over
136k channels and 72.9M videos published between May 2005 and October 2019, as
well as channel-level time-series data with weekly subscriber and view counts.
Leveraging channel ranks from socialblade.com, an online service that provides
information about YouTube, we are able to assess and enhance the
representativeness of the sample of channels. Additionally, the dataset also
contains a table specifying which videos a set of 449M anonymous users
commented on. YouNiverse, publicly available at
https://doi.org/10.5281/zenodo.4650046, will empower the community to do
research with and about YouTube.
Related papers
- HOTVCOM: Generating Buzzworthy Comments for Videos [49.39846630199698]
This study introduces textscHotVCom, the largest Chinese video hot-comment dataset, comprising 94k diverse videos and 137 million comments.
We also present the textttComHeat framework, which synergistically integrates visual, auditory, and textual data to generate influential hot-comments on the Chinese video dataset.
arXiv Detail & Related papers (2024-09-23T16:45:13Z) - Detours for Navigating Instructional Videos [58.1645668396789]
We propose VidDetours, a video-language approach that learns to retrieve the targeted temporal segments from a large repository of how-to's.
We show our model's significant improvements over best available methods for video retrieval and question answering, with recall rates exceeding the state of the art by 35%.
arXiv Detail & Related papers (2024-01-03T16:38:56Z) - Tube2Vec: Social and Semantic Embeddings of YouTube Channels [11.321096553990824]
We create embeddings that capture social sharing behavior, video metadata, and YouTube's video recommendations.
We evaluate these embeddings using crowdsourcing and existing datasets.
We share embeddings capturing the social and semantic dimensions of 44,000 YouTube channels for the benefit of future research.
arXiv Detail & Related papers (2023-06-29T20:43:57Z) - TGDataset: a Collection of Over One Hundred Thousand Telegram Channels [69.22187804798162]
This paper presents the TGDataset, a new dataset that includes 120,979 Telegram channels and over 400 million messages.
We analyze the languages spoken within our dataset and the topic covered by English channels.
In addition to the raw dataset, we released the scripts we used to analyze the dataset and the list of channels belonging to the network of a new conspiracy theory called Sabmyk.
arXiv Detail & Related papers (2023-03-09T15:42:38Z) - YouTubers Not madeForKids: Detecting Channels Sharing Inappropriate
Videos Targeting Children [3.936965297430477]
We study YouTube channels found to post suitable or disturbing videos targeting kids in the past.
We identify a clear discrepancy between what YouTube assumes and flags as inappropriate content and channel, vs. what is found to be disturbing content and still available on the platform.
arXiv Detail & Related papers (2022-05-27T10:34:15Z) - Subscriptions and external links help drive resentful users to
alternative and extremist YouTube videos [7.945705756085774]
We show that exposure to alternative and extremist channel videos on YouTube is heavily concentrated among a small group of people with high prior levels of gender and racial resentment.
Our findings suggest YouTube's algorithms were not sending people down "rabbit holes" during our observation window in 2020.
However, the platform continues to play a key role in facilitating exposure to content from alternative and extremist channels among dedicated audiences.
arXiv Detail & Related papers (2022-04-22T20:22:06Z) - Subjective and Objective Analysis of Streamed Gaming Videos [60.32100758447269]
We study subjective and objective Video Quality Assessment (VQA) models on gaming videos.
We created a novel gaming video video resource, called the LIVE-YouTube Gaming video quality (LIVE-YT-Gaming) database, comprised of 600 real gaming videos.
We conducted a subjective human study on this data, yielding 18,600 human quality ratings recorded by 61 human subjects.
arXiv Detail & Related papers (2022-03-24T03:02:57Z) - The MeLa BitChute Dataset [0.0]
We present a near-complete dataset of over 3M videos from 61K channels over 2.5 years (June 2019 to December 2021) from the social video hosting platform BitChute.
We include a variety of video-level metadata, including comments, channel descriptions, and views for each video.
arXiv Detail & Related papers (2022-02-10T23:12:28Z) - Understanding YouTube Communities via Subscription-based Channel
Embeddings [0.0]
This paper presents new methods to discover and classify YouTube channels.
The methods use a self-supervised learning approach that leverages the public subscription pages of commenters.
We create a new dataset to analyze the amount of traffic going to different political content.
arXiv Detail & Related papers (2020-10-19T22:00:04Z) - What is More Likely to Happen Next? Video-and-Language Future Event
Prediction [111.93601253692165]
Given a video with aligned dialogue, people can often infer what is more likely to happen next.
In this work, we explore whether AI models are able to learn to make such multimodal commonsense next-event predictions.
We collect a new dataset, named Video-and-Language Event Prediction, with 28,726 future event prediction examples.
arXiv Detail & Related papers (2020-10-15T19:56:47Z) - Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube
Thumbnails of Popular Videos [98.87558262467257]
This study explores culture preferences among countries using the thumbnails of YouTube trending videos.
Experimental results indicate that the users from similar cultures shares interests in watching similar videos on YouTube.
arXiv Detail & Related papers (2020-01-27T20:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.