A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games
- URL: http://arxiv.org/abs/2108.11149v2
- Date: Thu, 26 Aug 2021 11:18:50 GMT
- Title: A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games
- Authors: Henrik Biermann, Jonas Theiner, Manuel Bassek, Dominik Raabe, Daniel
Memmert, Ralph Ewerth
- Abstract summary: We present a universal taxonomy that covers a wide range of low and high-level events for invasion games.
We release two multi-modal datasets comprising video and positional data with gold-standard annotations to foster research in fine-grained and ball-centered event spotting.
- Score: 3.7111751305143654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automatic detection of events in complex sports games like soccer and
handball using positional or video data is of large interest in research and
industry. One requirement is a fundamental understanding of underlying
concepts, i.e., events that occur on the pitch. Previous work often deals only
with so-called low-level events based on well-defined rules such as free kicks,
free throws, or goals. High-level events, such as passes, are less frequently
approached due to a lack of consistent definitions. This introduces a level of
ambiguity that necessities careful validation when regarding event annotations.
Yet, this validation step is usually neglected as the majority of studies adopt
annotations from commercial providers on private datasets of unknown quality
and focuses on soccer only. To address these issues, we present (1) a universal
taxonomy that covers a wide range of low and high-level events for invasion
games and is exemplarily refined to soccer and handball, and (2) release two
multi-modal datasets comprising video and positional data with gold-standard
annotations to foster research in fine-grained and ball-centered event
spotting. Experiments on human performance demonstrate the robustness of the
proposed taxonomy, and that disagreements and ambiguities in the annotation
increase with the complexity of the event. An I3D model for video
classification is adopted for event spotting and reveals the potential for
benchmarking. Datasets are available at: https://github.com/mm4spa/eigd
Related papers
- Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM [47.786978666537436]
We propose a Two-Stage Prefix-Enhanced MLLM (TSPE) approach for event attribution in movie videos.
In the local stage, we introduce an interaction-aware prefix that guides the model to focus on the relevant multimodal information within a single clip.
In the global stage, we strengthen the connections between associated events using an inferential knowledge graph.
arXiv Detail & Related papers (2024-09-14T08:30:59Z) - Improving Event Definition Following For Zero-Shot Event Detection [66.27883872707523]
Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types.
We aim to improve zero-shot event detection by training models to better follow event definitions.
arXiv Detail & Related papers (2024-03-05T01:46:50Z) - A Graph-Based Method for Soccer Action Spotting Using Unsupervised
Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences.
In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs.
For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z) - Unifying Event Detection and Captioning as Sequence Generation via
Pre-Training [53.613265415703815]
We propose a unified pre-training and fine-tuning framework to enhance the inter-task association between event detection and captioning.
Our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data.
arXiv Detail & Related papers (2022-07-18T14:18:13Z) - PILED: An Identify-and-Localize Framework for Few-Shot Event Detection [79.66042333016478]
In our study, we employ cloze prompts to elicit event-related knowledge from pretrained language models.
We minimize the number of type-specific parameters, enabling our model to quickly adapt to event detection tasks for new types.
arXiv Detail & Related papers (2022-02-15T18:01:39Z) - MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized
Sports Actions [39.27858380391081]
This paper aims to present a new multi-person dataset of atomic-temporal actions, coined as MultiSports.
We build the dataset of MultiSports v1.0 by selecting 4 sports classes, collecting around 3200 video clips, and annotating around 37790 action instances with 907k bounding boxes.
arXiv Detail & Related papers (2021-05-16T10:40:30Z) - RMS-Net: Regression and Masking for Soccer Event Spotting [52.742046866220484]
We devise a lightweight and modular network for action spotting, which can simultaneously predict the event label and its temporal offset.
When tested on the SoccerNet dataset and using standard features, our full proposal exceeds the current state of the art by 3 Average-mAP points.
arXiv Detail & Related papers (2021-02-15T16:04:18Z) - Automatic Pass Annotation from Soccer VideoStreams Based on Object
Detection and LSTM [6.87782863484826]
PassNet is a method to recognize the most frequent events in soccer, i.e., passes, from video streams.
Our results show good results and significant improvement in the accuracy of pass detection.
PassNet is the first step towards an automated event annotation system.
arXiv Detail & Related papers (2020-07-13T16:14:41Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.