Human in Events: A Large-Scale Benchmark for Human-centric Video
Analysis in Complex Events
- URL: http://arxiv.org/abs/2005.04490v6
- Date: Thu, 13 Jul 2023 13:23:05 GMT
- Title: Human in Events: A Large-Scale Benchmark for Human-centric Video
Analysis in Complex Events
- Authors: Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning
Xu, Hongkai Xiong, Guo-Jun Qi, Nicu Sebe
- Abstract summary: We present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve.
It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time.
Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation.
- Score: 106.19047816743988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Along with the development of modern smart cities, human-centric video
analysis has been encountering the challenge of analyzing diverse and complex
events in real scenes. A complex event relates to dense crowds, anomalous
individuals, or collective behaviors. However, limited by the scale and
coverage of existing video datasets, few human analysis approaches have
reported their performances on such complex events. To this end, we present a
new large-scale dataset with comprehensive annotations, named Human-in-Events
or HiEve (Human-centric video analysis in complex Events), for the
understanding of human motions, poses, and actions in a variety of realistic
events, especially in crowd & complex events. It contains a record number of
poses (>1M), the largest number of action instances (>56k) under complex
events, as well as one of the largest numbers of trajectories lasting for
longer time (with an average trajectory length of >480 frames). Based on its
diverse annotation, we present two simple baselines for action recognition and
pose estimation, respectively. They leverage cross-label information during
training to enhance the feature learning in corresponding visual tasks.
Experiments show that they could boost the performance of existing action
recognition and pose estimation pipelines. More importantly, they prove the
widely ranged annotations in HiEve can improve various video tasks.
Furthermore, we conduct extensive experiments to benchmark recent video
analysis approaches together with our baseline methods, demonstrating HiEve is
a challenging dataset for human-centric video analysis. We expect that the
dataset will advance the development of cutting-edge techniques in
human-centric analysis and the understanding of complex events. The dataset is
available at http://humaninevents.org
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - A Survey of Video Datasets for Grounded Event Understanding [34.11140286628736]
multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding.
We survey 105 video datasets that require event understanding capability.
arXiv Detail & Related papers (2024-06-14T00:36:55Z) - CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding.
Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects.
We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z) - SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos [43.536874272236986]
We propose a new video visual relation detection task: video human-human interaction detection.
SportsHHI contains 34 high-level interaction classes from basketball and volleyball sports.
We conduct extensive experiments to reveal the key factors for a successful human-human interaction detector.
arXiv Detail & Related papers (2024-04-06T09:13:03Z) - Human-centric Scene Understanding for 3D Large-scale Scenarios [52.12727427303162]
We present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife.
Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc.
arXiv Detail & Related papers (2023-07-26T08:40:46Z) - JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection [54.696819174421584]
We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
arXiv Detail & Related papers (2021-06-16T14:43:46Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.