Survey of Action Recognition, Spotting and Spatio-Temporal Localization
in Soccer -- Current Trends and Research Perspectives
- URL: http://arxiv.org/abs/2309.12067v1
- Date: Thu, 21 Sep 2023 13:36:57 GMT
- Title: Survey of Action Recognition, Spotting and Spatio-Temporal Localization
in Soccer -- Current Trends and Research Perspectives
- Authors: Karolina Seweryn, Anna Wr\'oblewska, Szymon {\L}ukasik
- Abstract summary: Action scene understanding in soccer is a challenging task due to the complex and dynamic nature of the game.
The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional methods.
multimodal methods integrate information from multiple sources, such as video and audio data, and also those that represent one source in various ways.
- Score: 0.7673339435080445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action scene understanding in soccer is a challenging task due to the complex
and dynamic nature of the game, as well as the interactions between players.
This article provides a comprehensive overview of this task divided into action
recognition, spotting, and spatio-temporal action localization, with a
particular emphasis on the modalities used and multimodal methods. We explore
the publicly available data sources and metrics used to evaluate models'
performance. The article reviews recent state-of-the-art methods that leverage
deep learning techniques and traditional methods. We focus on multimodal
methods, which integrate information from multiple sources, such as video and
audio data, and also those that represent one source in various ways. The
advantages and limitations of methods are discussed, along with their potential
for improving the accuracy and robustness of models. Finally, the article
highlights some of the open research questions and future directions in the
field of soccer action recognition, including the potential for multimodal
methods to advance this field. Overall, this survey provides a valuable
resource for researchers interested in the field of action scene understanding
in soccer.
Related papers
- Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application [21.555902498178387]
Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry.
The endeavor to compress language models while maintaining their accuracy has become a focal point of research.
Knowledge distillation has emerged as an effective technique to enhance inference speed without greatly compromising performance.
arXiv Detail & Related papers (2024-07-02T02:14:42Z) - OSL-ActionSpotting: A Unified Library for Action Spotting in Sports Videos [56.393522913188704]
We introduce OSL-ActionSpotting, a Python library that unifies different action spotting algorithms to streamline research and applications in sports video analytics.
We successfully integrated three cornerstone action spotting methods into OSL-ActionSpotting, achieving performance metrics that match those of the original, disparates.
arXiv Detail & Related papers (2024-07-01T13:17:37Z) - Masked Modeling for Self-supervised Representation Learning on Vision
and Beyond [69.64364187449773]
Masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training.
We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more.
We conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research.
arXiv Detail & Related papers (2023-12-31T12:03:21Z) - Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
Detection [72.36017150922504]
We propose a multi-modal contextual knowledge distillation framework, MMC-Det, to transfer the learned contextual knowledge from a teacher fusion transformer to a student detector.
The diverse multi-modal masked language modeling is realized by an object divergence constraint upon traditional multi-modal masked language modeling (MLM)
arXiv Detail & Related papers (2023-08-30T08:33:13Z) - A Multi-stage deep architecture for summary generation of soccer videos [11.41978608521222]
We propose a method to generate the summary of a soccer match exploiting both the audio and the event metadata.
The results show that our method can detect the actions of the match, identify which of these actions should belong to the summary and then propose multiple candidate summaries.
arXiv Detail & Related papers (2022-05-02T07:26:35Z) - Continuous Human Action Recognition for Human-Machine Interaction: A
Review [39.593687054839265]
Recognising actions within an input video are challenging but necessary tasks for applications that require real-time human-machine interaction.
We provide on the feature extraction and learning strategies that are used on most state-of-the-art methods.
We investigate the application of such models to real-world scenarios and discuss several limitations and key research directions.
arXiv Detail & Related papers (2022-02-26T09:25:44Z) - An overview of event extraction and its applications [1.8047694351309205]
This study provides a comprehensive overview of the state-of-the-art event extraction methods and their applications from text.
A trait of this survey is that it provides an overview in moderate complexity, avoiding involving too many details of particular approaches.
arXiv Detail & Related papers (2021-11-05T01:37:47Z) - Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts [86.56462654572813]
We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
arXiv Detail & Related papers (2021-04-14T11:09:03Z) - From Handcrafted to Deep Features for Pedestrian Detection: A Survey [148.35460817092908]
Pedestrian detection is an important but challenging problem in computer vision.
Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features.
In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection.
arXiv Detail & Related papers (2020-10-01T14:51:10Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.