Fugu-MT 論文翻訳(概要): Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

論文の概要: Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

arxiv url: http://arxiv.org/abs/2005.04490v6
Date: Thu, 13 Jul 2023 13:23:05 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-14 18:03:37.413081
Title: Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events
Title（参考訳）: human in events: 複雑なイベントにおける人間中心のビデオ分析のための大規模ベンチマーク
Authors: Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning Xu, Hongkai Xiong, Guo-Jun Qi, Nicu Sebe
Abstract要約: 我々は、Human-in-Events(Human-in-Events)またはHiEve(HiEve)という、包括的なアノテーションを備えた新しい大規模データセットを提案する。これには、複雑なイベントにおけるアクションインスタンスの最大数(>56k)と、長時間続くトラジェクトリの最大数(>1M)が含まれている。多様なアノテーションに基づいて、アクション認識とポーズ推定のための2つのシンプルなベースラインを提示する。
参考スコア（独自算出の注目度）: 106.19047816743988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes. A complex event relates to dense crowds, anomalous individuals, or collective behaviors. However, limited by the scale and coverage of existing video datasets, few human analysis approaches have reported their performances on such complex events. To this end, we present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve (Human-centric video analysis in complex Events), for the understanding of human motions, poses, and actions in a variety of realistic events, especially in crowd & complex events. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average trajectory length of >480 frames). Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation, respectively. They leverage cross-label information during training to enhance the feature learning in corresponding visual tasks. Experiments show that they could boost the performance of existing action recognition and pose estimation pipelines. More importantly, they prove the widely ranged annotations in HiEve can improve various video tasks. Furthermore, we conduct extensive experiments to benchmark recent video analysis approaches together with our baseline methods, demonstrating HiEve is a challenging dataset for human-centric video analysis. We expect that the dataset will advance the development of cutting-edge techniques in human-centric analysis and the understanding of complex events. The dataset is available at http://humaninevents.org
Abstract（参考訳）: 現代のスマートシティの発展とともに、人間中心のビデオ分析は、現実の場面で多様な複雑なイベントを分析するという課題に直面している。複雑な出来事は、密集した群衆、異常な個人、集団的行動に関連する。しかしながら、既存のビデオデータセットの規模とカバレッジによって制限されるため、このような複雑なイベントに対するパフォーマンスを報告している人的分析アプローチはほとんどない。この目的のために,特に群集や複合イベントにおいて,人の動き,ポーズ,行動を理解するために,Human-in-Events(Human-centric video analysis in complex Events)と呼ばれる包括的なアノテーションを備えた大規模データセットを提案する。複雑なイベントにおけるアクションインスタンスの最大数 (>56k) であるポーズ数 (>1M) と、長い時間(平均軌道長は >480 フレーム)続くトラジェクトリの最大数 (the most number of trajectories) を含む。多様なアノテーションに基づいて,行動認識とポーズ推定のための2つの単純なベースラインを提案する。トレーニング中のクロスラベル情報を活用して、対応する視覚タスクにおける特徴学習を強化する。実験により、既存のアクション認識とポーズ推定パイプラインのパフォーマンスが向上することが示された。さらに重要なことに、hieveの幅広いアノテーションが様々なビデオタスクを改善することを証明している。さらに,最近のビデオ解析手法をベースライン手法とともにベンチマークするために広範囲な実験を行い,HiEveは人間中心のビデオ解析の挑戦的なデータセットであることを示した。データセットは、人間中心の分析と複雑な事象の理解における最先端技術の開発を前進させることを期待している。データセットはhttp://humaninevents.orgで利用可能である。

論文の概要: Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events

関連論文リスト