A Short Note on the Kinetics-700-2020 Human Action Dataset
- URL: http://arxiv.org/abs/2010.10864v1
- Date: Wed, 21 Oct 2020 09:47:09 GMT
- Title: A Short Note on the Kinetics-700-2020 Human Action Dataset
- Authors: Lucas Smaira (DeepMind), Jo\~ao Carreira (DeepMind), Eric Noland
(DeepMind), Ellen Clancy (DeepMind), Amy Wu (DeepMind), Andrew Zisserman
(DeepMind)
- Abstract summary: We describe the 2020 edition of the DeepMind Kinetics human action dataset.
In this new version, there are at least 700 video clips from different YouTube videos for each of the 700 classes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe the 2020 edition of the DeepMind Kinetics human action dataset,
which replenishes and extends the Kinetics-700 dataset. In this new version,
there are at least 700 video clips from different YouTube videos for each of
the 700 classes. This paper details the changes introduced for this new release
of the dataset and includes a comprehensive set of statistics as well as
baseline results using the I3D network.
Related papers
- Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild [66.34146236875822]
The Nymeria dataset is a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices.
It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km.
The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545.
arXiv Detail & Related papers (2024-06-14T10:23:53Z) - CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding.
Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects.
We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z) - VEATIC: Video-based Emotion and Affect Tracking in Context Dataset [34.77364955121413]
We introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context dataset (VEATIC)
VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation.
Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame.
arXiv Detail & Related papers (2023-09-13T06:31:35Z) - AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary
Detection [70.99025467739715]
We release a new public Short video sHot bOundary deTection dataset, named SHOT.
SHOT consists of 853 complete short videos and 11,606 shot annotations, with 2,716 high quality shot boundary annotations in 200 test videos.
Our proposed approach, named AutoShot, achieves higher F1 scores than previous state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-12T19:01:21Z) - Revisiting 3D ResNets for Video Recognition [18.91688307058961]
This note studies effective training and scaling strategies for video recognition models.
We propose a simple scaling strategy for 3D ResNets, in combination with improved training strategies and minor architectural changes.
arXiv Detail & Related papers (2021-09-03T18:27:52Z) - Spoken Moments: Learning Joint Audio-Visual Representations from Video
Descriptions [75.77044856100349]
We present the Spoken Moments dataset of 500k spoken captions each attributed to a unique short video depicting a broad range of different events.
We show that our AMM approach consistently improves our results and that models trained on our Spoken Moments dataset generalize better than those trained on other video-caption datasets.
arXiv Detail & Related papers (2021-05-10T16:30:46Z) - Quo Vadis, Skeleton Action Recognition ? [11.389618872289647]
We study current and upcoming frontiers across the landscape of skeleton-based human action recognition.
To study skeleton-action recognition in the wild, we introduce Skeletics-152, a curated subset of RGB videos sourced from Kinetics-700.
We extend our study to include out-of-context actions by introducing Skeleton-Mimetics and Metaphorics datasets.
arXiv Detail & Related papers (2020-07-04T11:02:21Z) - Rescaling Egocentric Vision [48.57283024015145]
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.
The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos.
Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments)
arXiv Detail & Related papers (2020-06-23T18:28:04Z) - The AVA-Kinetics Localized Human Actions Video Dataset [124.41706958756049]
This paper describes the AVA-Kinetics localized human actions video dataset.
The dataset is collected by annotating videos from the Kinetics-700 dataset using the AVA annotation protocol.
The dataset contains over 230k clips annotated with the 80 AVA action classes for each of the humans in key-frames.
arXiv Detail & Related papers (2020-05-01T04:17:14Z) - Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs? [18.95620388632382]
In the early era of deep neural networks, 2D CNNs have been better than 3D CNNs in the context of video recognition.
Recent studies revealed that 3D CNNs can outperform 2D CNNs trained on a large-scale video dataset.
arXiv Detail & Related papers (2020-04-10T09:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.