Related papers: Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts

Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts

URL: http://arxiv.org/abs/2104.06779v1
Date: Wed, 14 Apr 2021 11:09:03 GMT
Title: Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts
Authors: Silvio Giancola, Bernard Ghanem
Abstract summary: We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game. We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge. We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
Score: 86.56462654572813
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Toward the goal of automatic production for sports broadcasts, a paramount task consists in understanding the high-level semantic information of the game in play. For instance, recognizing and localizing the main actions of the game would allow producers to adapt and automatize the broadcast production, focusing on the important details of the game and maximizing the spectator engagement. In this paper, we focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game. To that end, we propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge. Different from previous pooling methods that consider the temporal context as a single set to pool from, we split the context before and after an action occurs. We argue that considering the contextual information around the action spot as a single entity leads to a sub-optimal learning for the pooling module. With NetVLAD++, we disentangle the context from the past and future frames and learn specific vocabularies of semantics for each subsets, avoiding to blend and blur such vocabulary in time. Injecting such prior knowledge creates more informative pooling modules and more discriminative pooled features, leading into a better understanding of the actions. We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting, a +12.7% improvement w.r.t the current state-of-the-art.

Related papers

Do We Need Large VLMs for Spotting Soccer Actions? [4.334105740533729]
We propose a shift from this video-centric approach to a text-based task, making it lightweight and scalable.<n>We posit that expert commentary contains enough information to reliably spot key actions in a match.<n>Our experiments show that this language-centric approach performs effectively in detecting critical match events.
arXiv Detail & Related papers (2025-06-20T16:45:54Z)
Beyond Pixels: Leveraging the Language of Soccer to Improve Spatio-Temporal Action Detection in Broadcast Videos [1.4249472316161877]
State-of-the-art,temporal action detection methods show promising results for extracting events from broadcast videos.<n>Many false positives could be resolved by considering a broader sequence of actions and game-state information.<n>We address this by reasoning at the game level and improving STAD through the addition of a denoising sequence task.
arXiv Detail & Related papers (2025-05-14T15:05:36Z)
Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances. This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z)
Towards Active Learning for Action Spotting in Association Football Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns. Current algorithms face significant challenges when learning from limited annotated data. We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z)
A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences. In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs. For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z)
Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision. We introduce a framework that learns two feature subspaces respectively for actions and their context. The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z)
Improved Soccer Action Spotting using both Audio and Video Streams [3.4376560669160394]
We propose a study on combining audio and video information at different stages of deep neural network architectures. We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues. We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43%$ for the action classification task and of $4.19%$ for the action spotting task.
arXiv Detail & Related papers (2020-11-09T09:12:44Z)
Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top. Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition. We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.