Related papers: Improved Soccer Action Spotting using both Audio and Video Streams

Improved Soccer Action Spotting using both Audio and Video Streams

URL: http://arxiv.org/abs/2011.04258v1
Date: Mon, 9 Nov 2020 09:12:44 GMT
Title: Improved Soccer Action Spotting using both Audio and Video Streams
Authors: Bastien Vanderplaetse, St\'ephane Dupont
Abstract summary: We propose a study on combining audio and video information at different stages of deep neural network architectures. We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues. We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43%$ for the action classification task and of $4.19%$ for the action spotting task.
Score: 3.4376560669160394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a study on multi-modal (audio and video) action spotting and classification in soccer videos. Action spotting and classification are the tasks that consist in finding the temporal anchors of events in a video and determine which event they are. This is an important application of general activity understanding. Here, we propose an experimental study on combining audio and video information at different stages of deep neural network architectures. We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues. Through this work, we evaluated several ways to integrate audio stream into video-only-based architectures. We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43\%$ for the action classification task and of $4.19\%$ for the action spotting task.

Related papers

Do We Need Large VLMs for Spotting Soccer Actions? [4.334105740533729]
We propose a shift from this video-centric approach to a text-based task, making it lightweight and scalable.<n>We posit that expert commentary contains enough information to reliably spot key actions in a match.<n>Our experiments show that this language-centric approach performs effectively in detecting critical match events.
arXiv Detail & Related papers (2025-06-20T16:45:54Z)
Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances. This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z)
Towards Active Learning for Action Spotting in Association Football Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns. Current algorithms face significant challenges when learning from limited annotated data. We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z)
A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification [75.93186954061943]
Action spotting involves understanding the dynamics of the game, the complexity of events, and the variation of video sequences. In this work, we focus on the former by (a) identifying and representing the players, referees, and goalkeepers as nodes in a graph, and by (b) modeling their temporal interactions as sequences of graphs. For the player identification task, our method obtains an overall performance of 57.83% average-mAP by combining it with other modalities.
arXiv Detail & Related papers (2022-11-22T15:23:53Z)
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We formulate two sets of action detection problems -- emphaction localization and emphaction recognition. The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z)
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection [3.7709686875144337]
We present a two-stage paradigm to detect what and when events happen in soccer broadcast videos. Specifically, we fine-tune multiple action recognition models on soccer data to extract high-level semantic features. This approach achieved the state-of-the-art performance in both two tasks, i.e., action spotting and replay grounding, in the SoccerNet-v2 Challenge.
arXiv Detail & Related papers (2021-06-28T08:00:21Z)
Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts [86.56462654572813]
We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game. We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge. We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
arXiv Detail & Related papers (2021-04-14T11:09:03Z)
Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos. We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames. We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z)
Event detection in coarsely annotated sports videos via parallel multi receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration. We propose the task of event detection in coarsely annotated videos. We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.