MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized
Sports Actions
- URL: http://arxiv.org/abs/2105.07404v1
- Date: Sun, 16 May 2021 10:40:30 GMT
- Title: MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized
Sports Actions
- Authors: Yixuan Li, Lei Chen, Runyu He, Zhenzhi Wang, Gangshan Wu, Limin Wang
- Abstract summary: This paper aims to present a new multi-person dataset of atomic-temporal actions, coined as MultiSports.
We build the dataset of MultiSports v1.0 by selecting 4 sports classes, collecting around 3200 video clips, and annotating around 37790 action instances with 907k bounding boxes.
- Score: 39.27858380391081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatio-temporal action detection is an important and challenging problem in
video understanding. The existing action detection benchmarks are limited in
aspects of small numbers of instances in a trimmed video or relatively
low-level atomic actions. This paper aims to present a new multi-person dataset
of spatio-temporal localized sports actions, coined as MultiSports. We first
analyze the important ingredients of constructing a realistic and challenging
dataset for spatio-temporal action detection by proposing three criteria: (1)
motion dependent identification, (2) with well-defined boundaries, (3)
relatively high-level classes. Based on these guidelines, we build the dataset
of Multi-Sports v1.0 by selecting 4 sports classes, collecting around 3200
video clips, and annotating around 37790 action instances with 907k bounding
boxes. Our datasets are characterized with important properties of strong
diversity, detailed annotation, and high quality. Our MultiSports, with its
realistic setting and dense annotations, exposes the intrinsic challenge of
action localization. To benchmark this, we adapt several representative methods
to our dataset and give an in-depth analysis on the difficulty of action
localization in our dataset. We hope our MultiSports can serve as a standard
benchmark for spatio-temporal action detection in the future. Our dataset
website is at https://deeperaction.github.io/multisports/.
Related papers
- Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances.
This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z) - Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset [16.407837909069073]
We introduce the VideoBadminton dataset derived from high-quality badminton footage.
The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions.
arXiv Detail & Related papers (2024-03-19T02:52:06Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - Sports Video Analysis on Large-Scale Data [10.24207108909385]
This paper investigates the modeling of automated machine description on sports video.
We propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning.
arXiv Detail & Related papers (2022-08-09T16:59:24Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games [3.7111751305143654]
We present a universal taxonomy that covers a wide range of low and high-level events for invasion games.
We release two multi-modal datasets comprising video and positional data with gold-standard annotations to foster research in fine-grained and ball-centered event spotting.
arXiv Detail & Related papers (2021-08-25T10:09:28Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.