Tube2Vec: Social and Semantic Embeddings of YouTube Channels
- URL: http://arxiv.org/abs/2306.17298v1
- Date: Thu, 29 Jun 2023 20:43:57 GMT
- Title: Tube2Vec: Social and Semantic Embeddings of YouTube Channels
- Authors: L\'eopaul Boesinger, Manoel Horta Ribeiro, Veniamin Veselovsky, Robert
West
- Abstract summary: We create embeddings that capture social sharing behavior, video metadata, and YouTube's video recommendations.
We evaluate these embeddings using crowdsourcing and existing datasets.
We share embeddings capturing the social and semantic dimensions of 44,000 YouTube channels for the benefit of future research.
- Score: 11.321096553990824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research using YouTube data often explores social and semantic dimensions of
channels and videos. Typically, analyses rely on laborious manual annotation of
content and content creators, often found by low-recall methods such as keyword
search. Here, we explore an alternative approach, using latent representations
(embeddings) obtained via machine learning. Using a large dataset of YouTube
links shared on Reddit; we create embeddings that capture social sharing
behavior, video metadata (title, description, etc.), and YouTube's video
recommendations. We evaluate these embeddings using crowdsourcing and existing
datasets, finding that recommendation embeddings excel at capturing both social
and semantic dimensions, although social-sharing embeddings better correlate
with existing partisan scores. We share embeddings capturing the social and
semantic dimensions of 44,000 YouTube channels for the benefit of future
research on YouTube: https://github.com/epfl-dlab/youtube-embeddings.
Related papers
- Detecting Suspicious Commenter Mob Behaviors on YouTube Using Graph2Vec [1.1371889042789218]
This paper presents a social network analysis-based methodology for detecting suspicious commenter mob-like behaviors among YouTube channels.
The method aims to characterize channels based on the level of such behavior and identify com-mon patterns across them.
The analysis revealed significant similarities among the channels, shedding light on the prevalence of suspicious commenter behavior.
arXiv Detail & Related papers (2023-11-09T23:49:29Z) - Micro-video Tagging via Jointly Modeling Social Influence and Tag
Relation [56.23157334014773]
85.7% of micro-videos lack annotation.
Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation.
We formulate micro-video tagging as a link prediction problem in a constructed heterogeneous network.
arXiv Detail & Related papers (2023-03-15T02:13:34Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - A Feature-space Multimodal Data Augmentation Technique for Text-video
Retrieval [16.548016892117083]
Text-video retrieval methods have received increased attention over the past few years.
Data augmentation techniques were introduced to increase the performance on unseen test examples.
We propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions by mixing semantically similar samples.
arXiv Detail & Related papers (2022-08-03T14:05:20Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - Classifying YouTube Comments Based on Sentiment and Type of Sentence [0.0]
We address the challenge of text extraction and classification from YouTube comments using well-known statistical measures and machine learning models.
The results show that our approach that incorporates conventional methods performs well on the classification task, validating its potential in assisting content creators increase viewer engagement on their channel.
arXiv Detail & Related papers (2021-10-31T18:08:10Z) - VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z) - YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking
YouTube [15.03145814947425]
YouNiverse is a large collection of channel and video metadata from English-language YouTube.
It comprises metadata from over 136k channels and 72.9M videos published between May 2005 and October 2019.
The dataset also contains a table specifying which videos a set of 449M anonymous users commented on.
arXiv Detail & Related papers (2020-12-18T17:46:47Z) - Understanding YouTube Communities via Subscription-based Channel
Embeddings [0.0]
This paper presents new methods to discover and classify YouTube channels.
The methods use a self-supervised learning approach that leverages the public subscription pages of commenters.
We create a new dataset to analyze the amount of traffic going to different political content.
arXiv Detail & Related papers (2020-10-19T22:00:04Z) - Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube
Thumbnails of Popular Videos [98.87558262467257]
This study explores culture preferences among countries using the thumbnails of YouTube trending videos.
Experimental results indicate that the users from similar cultures shares interests in watching similar videos on YouTube.
arXiv Detail & Related papers (2020-01-27T20:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.