Improved Soccer Action Spotting using both Audio and Video Streams
- URL: http://arxiv.org/abs/2011.04258v1
- Date: Mon, 9 Nov 2020 09:12:44 GMT
- Title: Improved Soccer Action Spotting using both Audio and Video Streams
- Authors: Bastien Vanderplaetse, St\'ephane Dupont
- Abstract summary: We propose a study on combining audio and video information at different stages of deep neural network architectures.
We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues.
We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43%$ for the action classification task and of $4.19%$ for the action spotting task.
- Score: 3.4376560669160394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a study on multi-modal (audio and video) action
spotting and classification in soccer videos. Action spotting and
classification are the tasks that consist in finding the temporal anchors of
events in a video and determine which event they are. This is an important
application of general activity understanding. Here, we propose an experimental
study on combining audio and video information at different stages of deep
neural network architectures. We used the SoccerNet benchmark dataset, which
contains annotated events for 500 soccer game videos from the Big Five European
leagues. Through this work, we evaluated several ways to integrate audio stream
into video-only-based architectures. We observed an average absolute
improvement of the mean Average Precision (mAP) metric of $7.43\%$ for the
action classification task and of $4.19\%$ for the action spotting task.
Related papers
- Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances.
This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - A Graph-Based Method for Soccer Action Spotting Using Unsupervised
Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences.
In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs.
For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - Feature Combination Meets Attention: Baidu Soccer Embeddings and
Transformer based Temporal Detection [3.7709686875144337]
We present a two-stage paradigm to detect what and when events happen in soccer broadcast videos.
Specifically, we fine-tune multiple action recognition models on soccer data to extract high-level semantic features.
This approach achieved the state-of-the-art performance in both two tasks, i.e., action spotting and replay grounding, in the SoccerNet-v2 Challenge.
arXiv Detail & Related papers (2021-06-28T08:00:21Z) - Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts [86.56462654572813]
We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
arXiv Detail & Related papers (2021-04-14T11:09:03Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.