Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts
- URL: http://arxiv.org/abs/2104.06779v1
- Date: Wed, 14 Apr 2021 11:09:03 GMT
- Title: Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts
- Authors: Silvio Giancola, Bernard Ghanem
- Abstract summary: We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
- Score: 86.56462654572813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Toward the goal of automatic production for sports broadcasts, a paramount
task consists in understanding the high-level semantic information of the game
in play. For instance, recognizing and localizing the main actions of the game
would allow producers to adapt and automatize the broadcast production,
focusing on the important details of the game and maximizing the spectator
engagement. In this paper, we focus our analysis on action spotting in soccer
broadcast, which consists in temporally localizing the main actions in a soccer
game. To that end, we propose a novel feature pooling method based on NetVLAD,
dubbed NetVLAD++, that embeds temporally-aware knowledge. Different from
previous pooling methods that consider the temporal context as a single set to
pool from, we split the context before and after an action occurs. We argue
that considering the contextual information around the action spot as a single
entity leads to a sub-optimal learning for the pooling module. With NetVLAD++,
we disentangle the context from the past and future frames and learn specific
vocabularies of semantics for each subsets, avoiding to blend and blur such
vocabulary in time. Injecting such prior knowledge creates more informative
pooling modules and more discriminative pooled features, leading into a better
understanding of the actions. We train and evaluate our methodology on the
recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action
spotting, a +12.7% improvement w.r.t the current state-of-the-art.
Related papers
- Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances.
This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - A Graph-Based Method for Soccer Action Spotting Using Unsupervised
Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences.
In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs.
For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z) - Weakly Supervised Temporal Action Localization Through Learning Explicit
Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision.
We introduce a framework that learns two feature subspaces respectively for actions and their context.
The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z) - Improved Soccer Action Spotting using both Audio and Video Streams [3.4376560669160394]
We propose a study on combining audio and video information at different stages of deep neural network architectures.
We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues.
We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43%$ for the action classification task and of $4.19%$ for the action spotting task.
arXiv Detail & Related papers (2020-11-09T09:12:44Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.