The MeLa BitChute Dataset
- URL: http://arxiv.org/abs/2202.05364v1
- Date: Thu, 10 Feb 2022 23:12:28 GMT
- Title: The MeLa BitChute Dataset
- Authors: Milo Trujillo, Maur\'icio Gruppi, Cody Buntain, Benjamin D. Horne
- Abstract summary: We present a near-complete dataset of over 3M videos from 61K channels over 2.5 years (June 2019 to December 2021) from the social video hosting platform BitChute.
We include a variety of video-level metadata, including comments, channel descriptions, and views for each video.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we present a near-complete dataset of over 3M videos from 61K
channels over 2.5 years (June 2019 to December 2021) from the social video
hosting platform BitChute, a commonly used alternative to YouTube.
Additionally, we include a variety of video-level metadata, including comments,
channel descriptions, and views for each video. The MeLa-BitChute dataset can
be found at:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KRD1VS.
Related papers
- A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [0.0]
This paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites between January 1, 2024, and May 31, 2024.
YouTube and TikTok account for 48.6% and 15.2% of the videos, respectively.
For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset.
arXiv Detail & Related papers (2024-06-11T20:14:22Z) - AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary
Detection [70.99025467739715]
We release a new public Short video sHot bOundary deTection dataset, named SHOT.
SHOT consists of 853 complete short videos and 11,606 shot annotations, with 2,716 high quality shot boundary annotations in 200 test videos.
Our proposed approach, named AutoShot, achieves higher F1 scores than previous state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-12T19:01:21Z) - FSVVD: A Dataset of Full Scene Volumetric Video [2.9151420469958533]
In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a full-scene volumetric video dataset.
Comprehensive dataset description and analysis are conducted, with potential usage of this dataset.
arXiv Detail & Related papers (2023-03-07T02:31:08Z) - AIM 2022 Challenge on Super-Resolution of Compressed Image and Video:
Dataset, Methods and Results [110.91485363392167]
This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022.
The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video.
arXiv Detail & Related papers (2022-08-23T20:32:38Z) - A Psycho-linguistic Analysis of BitChute [0.0]
This paper describes psycho-linguistic metadata for the videos, comments, and channels in the dataset using LIWC22.
We provide basic analysis and comparison of the language on BitChute to other social media platforms.
arXiv Detail & Related papers (2022-04-17T20:10:02Z) - Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive
Transformer [66.56167074658697]
We present a method that builds on 3D-VQGAN and transformers to generate videos with thousands of frames.
Our evaluation shows that our model trained on 16-frame video clips can generate diverse, coherent, and high-quality long videos.
We also showcase conditional extensions of our approach for generating meaningful long videos by incorporating temporal information with text and audio.
arXiv Detail & Related papers (2022-04-07T17:59:02Z) - Video and Text Matching with Conditioned Embeddings [81.81028089100727]
We present a method for matching a text sentence from a given corpus to a given video clip and vice versa.
In this work, we encode the dataset data in a way that takes into account the query's relevant information.
We show that our conditioned representation can be transferred to video-guided machine translation, where we improved the current results on VATEX.
arXiv Detail & Related papers (2021-10-21T17:31:50Z) - VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z) - Misinformation Detection on YouTube Using Video Captions [6.503828590815483]
This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles)
To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not.
arXiv Detail & Related papers (2021-07-02T10:02:36Z) - YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking
YouTube [15.03145814947425]
YouNiverse is a large collection of channel and video metadata from English-language YouTube.
It comprises metadata from over 136k channels and 72.9M videos published between May 2005 and October 2019.
The dataset also contains a table specifying which videos a set of 449M anonymous users commented on.
arXiv Detail & Related papers (2020-12-18T17:46:47Z) - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval [111.93601253692165]
TV show Retrieval (TVR) is a new multimodal retrieval dataset.
TVR requires systems to understand both videos and their associated subtitle (dialogue) texts.
The dataset contains 109K queries collected on 21.8K videos from 6 TV shows of diverse genres.
arXiv Detail & Related papers (2020-01-24T17:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.