EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video
- URL: http://arxiv.org/abs/2001.05488v2
- Date: Mon, 22 Feb 2021 18:33:20 GMT
- Title: EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video
- Authors: Jennifer J. Sun, Ting Liu, Alan S. Cowen, Florian Schroff, Hartwig
Adam, Gautam Prasad
- Abstract summary: Evoked Expressions from Videos dataset is a large-scale dataset for studying viewer responses to videos.
Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video.
There are 36.7 million annotations of viewer facial reactions to 23,574 videos (1,700 hours)
- Score: 23.95850953376425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Videos can evoke a range of affective responses in viewers. The ability to
predict evoked affect from a video, before viewers watch the video, can help in
content creation and video recommendation. We introduce the Evoked Expressions
from Videos (EEV) dataset, a large-scale dataset for studying viewer responses
to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression
labels, corresponding to the facial expression of viewers who reacted to the
video. We use an expression recognition model within our data collection
framework to achieve scalability. In total, there are 36.7 million annotations
of viewer facial reactions to 23,574 videos (1,700 hours). We use a publicly
available video corpus to obtain a diverse set of video content. We establish
baseline performance on the EEV dataset using an existing multimodal recurrent
model. Transfer learning experiments show an improvement in performance on the
LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and
diversity of the EEV dataset will encourage further explorations in video
understanding and affective computing. A subset of EEV is released at
https://github.com/google-research-datasets/eev.
Related papers
- OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos [58.5538620720541]
The dataset, OVR, contains annotations for over 72K videos.
OVR is almost an order of magnitude larger than previous datasets for video repetition.
We propose a baseline transformer-based counting model, OVRCounter, that can count repetitions in videos up to 320 frames long.
arXiv Detail & Related papers (2024-07-24T08:22:49Z) - DeVAn: Dense Video Annotation for Video-Language Models [68.70692422636313]
We present a novel human annotated dataset for evaluating the ability for visual-language models to generate descriptions for real-world video clips.
The dataset contains 8.5K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests.
arXiv Detail & Related papers (2023-10-08T08:02:43Z) - InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
and Generation [90.71796406228265]
InternVid is a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations.
The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words.
arXiv Detail & Related papers (2023-07-13T17:58:32Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Learning to Answer Visual Questions from Web Videos [89.71617065426146]
We propose to avoid manual annotation and generate a large-scale training dataset for video question answering.
We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations.
For a detailed evaluation we introduce iVQA, a new VideoQA dataset with reduced language bias and high-quality manual annotations.
arXiv Detail & Related papers (2022-05-10T16:34:26Z) - VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
Evaluation [124.02278735049235]
VALUE benchmark aims to cover a broad range of video genres, video lengths, data volumes, and task difficulty levels.
We evaluate various baseline methods with and without large-scale VidL pre-training.
The significant gap between our best model and human performance calls for future study for advanced VidL models.
arXiv Detail & Related papers (2021-06-08T18:34:21Z) - ERA: A Dataset and Deep Learning Benchmark for Event Recognition in
Aerial Videos [28.598710179447803]
We introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community.
We present a large-scale, human-annotated dataset, named ERA (Event Recognition in Aerial videos)
The ERA dataset is designed to have a significant intra-class variation and inter-class similarity.
arXiv Detail & Related papers (2020-01-30T15:25:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.