Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts
- URL: http://arxiv.org/abs/2104.06779v1
- Date: Wed, 14 Apr 2021 11:09:03 GMT
- Title: Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts
- Authors: Silvio Giancola, Bernard Ghanem
- Abstract summary: We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
- Score: 86.56462654572813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Toward the goal of automatic production for sports broadcasts, a paramount
task consists in understanding the high-level semantic information of the game
in play. For instance, recognizing and localizing the main actions of the game
would allow producers to adapt and automatize the broadcast production,
focusing on the important details of the game and maximizing the spectator
engagement. In this paper, we focus our analysis on action spotting in soccer
broadcast, which consists in temporally localizing the main actions in a soccer
game. To that end, we propose a novel feature pooling method based on NetVLAD,
dubbed NetVLAD++, that embeds temporally-aware knowledge. Different from
previous pooling methods that consider the temporal context as a single set to
pool from, we split the context before and after an action occurs. We argue
that considering the contextual information around the action spot as a single
entity leads to a sub-optimal learning for the pooling module. With NetVLAD++,
we disentangle the context from the past and future frames and learn specific
vocabularies of semantics for each subsets, avoiding to blend and blur such
vocabulary in time. Injecting such prior knowledge creates more informative
pooling modules and more discriminative pooled features, leading into a better
understanding of the actions. We train and evaluate our methodology on the
recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action
spotting, a +12.7% improvement w.r.t the current state-of-the-art.
Related papers
- Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - A Graph-Based Method for Soccer Action Spotting Using Unsupervised
Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences.
In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs.
For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z) - Boundary-aware Self-supervised Learning for Video Scene Segmentation [20.713635723315527]
Video scene segmentation is a task of temporally localizing scene boundaries in a video.
We introduce three novel boundary-aware pretext tasks: Shot-Scene Matching, Contextual Group Matching and Pseudo-boundary Prediction.
We achieve the new state-of-the-art on the MovieNet-SSeg benchmark.
arXiv Detail & Related papers (2022-01-14T02:14:07Z) - Weakly Supervised Temporal Action Localization Through Learning Explicit
Subspaces for Action and Context [151.23835595907596]
Methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision.
We introduce a framework that learns two feature subspaces respectively for actions and their context.
The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks.
arXiv Detail & Related papers (2021-03-30T08:26:53Z) - Improved Soccer Action Spotting using both Audio and Video Streams [3.4376560669160394]
We propose a study on combining audio and video information at different stages of deep neural network architectures.
We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues.
We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43%$ for the action classification task and of $4.19%$ for the action spotting task.
arXiv Detail & Related papers (2020-11-09T09:12:44Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.