Towards Universal Soccer Video Understanding
- URL: http://arxiv.org/abs/2412.01820v2
- Date: Wed, 04 Dec 2024 06:38:22 GMT
- Title: Towards Universal Soccer Video Understanding
- Authors: Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie,
- Abstract summary: This paper aims to develop a comprehensive framework for soccer understanding.
We introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1, complete matches.
We present first visual-language foundation model in the soccer domain, MatchVision, which leveragestemporal information across soccer videos and excels in various downstream tasks.
- Score: 58.889409980618396
- License:
- Abstract: As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding. Specifically, we make the following contributions in this paper: (i) we introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline; (ii) we present the first visual-language foundation model in the soccer domain, MatchVision, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks; (iii) we conduct extensive experiments and ablation studies on event classification, commentary generation, and multi-view foul recognition. MatchVision demonstrates state-of-the-art performance on all of them, substantially outperforming existing models, which highlights the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research.
Related papers
- SMGDiff: Soccer Motion Generation using diffusion probabilistic models [44.54275548434197]
Soccer is a globally renowned sport with significant applications in video games and VR/AR.
In this paper, we introduce SMGDiff, a novel two-stage framework for generating real-time and user-controllable soccer motions.
Our key idea is to integrate real-time character control with a powerful diffusion-based generative model, ensuring high-quality and diverse output motion.
arXiv Detail & Related papers (2024-11-25T09:25:53Z) - Deep learning for action spotting in association football videos [64.10841325879996]
The SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances.
This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
arXiv Detail & Related papers (2024-10-02T07:56:15Z) - Deep Understanding of Soccer Match Videos [20.783415560412003]
Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches.
Our system can detect key objects such as soccer balls, players and referees.
It also tracks the movements of players and the ball, recognizes player numbers, classifies scenes, and identifies highlights such as goal kicks.
arXiv Detail & Related papers (2024-07-11T05:54:13Z) - MatchTime: Towards Automatic Soccer Game Commentary Generation [52.431010585268865]
We consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches.
Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale.
Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice.
arXiv Detail & Related papers (2024-06-26T17:57:25Z) - A Multi-stage deep architecture for summary generation of soccer videos [11.41978608521222]
We propose a method to generate the summary of a soccer match exploiting both the audio and the event metadata.
The results show that our method can detect the actions of the match, identify which of these actions should belong to the summary and then propose multiple candidate summaries.
arXiv Detail & Related papers (2022-05-02T07:26:35Z) - SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
Soccer Videos [62.686484228479095]
We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
arXiv Detail & Related papers (2022-04-14T12:22:12Z) - Temporally-Aware Feature Pooling for Action Spotting in Soccer
Broadcasts [86.56462654572813]
We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
arXiv Detail & Related papers (2021-04-14T11:09:03Z) - SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of
Broadcast Soccer Videos [71.72665910128975]
SoccerNet-v2 is a novel large-scale corpus of manual annotations for the SoccerNet video dataset.
We release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos.
We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection.
arXiv Detail & Related papers (2020-11-26T16:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.