MatchTime: Towards Automatic Soccer Game Commentary Generation
        - URL: http://arxiv.org/abs/2406.18530v1
- Date: Wed, 26 Jun 2024 17:57:25 GMT
- Title: MatchTime: Towards Automatic Soccer Game Commentary Generation
- Authors: Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie, 
- Abstract summary: We consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches.
Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale.
Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice.
- Score: 52.431010585268865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks. 
 
      
        Related papers
        - TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer   Commentary Generation [13.835968474349034]
 TimeSoccer is the first end-to-end soccer MLLM for Single-anchor Video Captioning (SDVC) in full-match soccer videos.
TimeSoccer jointly predicts timestamps and generates captions in a single pass, enabling global context modeling.
MoFA-Select is a training-free, motion-aware frame compression module that adaptively selects representative frames.
 arXiv  Detail & Related papers  (2025-04-24T08:27:42Z)
- Towards Universal Soccer Video Understanding [58.889409980618396]
 This paper aims to a comprehensive multi-modal framework for soccer understanding.
We introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1, complete matches.
We present an advanced soccer-specific visual, MatchVision, which leveragestemporal information across soccer videos and excels in various downstream tasks.
 arXiv  Detail & Related papers  (2024-12-02T18:58:04Z)
- A Simple and Effective Temporal Grounding Pipeline for Basketball   Broadcast Footage [0.0]
 We present a reliable temporal grounding pipeline for video-to-analytic alignment of basketball broadcast footage.
Our method aligns a pre-labeled corpus of play-by-play annotations containing dense event annotations to video frames, enabling quick retrieval of labeled video segments.
 arXiv  Detail & Related papers  (2024-10-30T17:27:44Z)
- Going for GOAL: A Resource for Grounded Football Commentaries [66.10040637644697]
 We present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or soccer') highlights videos with transcribed live commentaries in English.
We provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation.
Results show that SOTA models perform reasonably well in most tasks.
 arXiv  Detail & Related papers  (2022-11-08T20:04:27Z)
- SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
  Soccer Videos [62.686484228479095]
 We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
 arXiv  Detail & Related papers  (2022-04-14T12:22:12Z)
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts [111.23364631136339]
 Video-and-language pre-training has shown promising improvements on various downstream tasks.
We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment.
Our code and pre-trained models will be released.
 arXiv  Detail & Related papers  (2021-12-17T15:55:53Z)
- Extraction of Positional Player Data from Broadcast Soccer Videos [3.7437974317872]
 We propose a pipeline for the fully-automated extraction of positional data from broadcast video recordings of soccer matches.
The system integrates all necessary sub-tasks like sports field registration, player detection, or team assignment.
A comprehensive experimental evaluation is presented for the individual modules as well as the entire pipeline.
 arXiv  Detail & Related papers  (2021-10-21T12:49:56Z)
- Temporally-Aware Feature Pooling for Action Spotting in Soccer
  Broadcasts [86.56462654572813]
 We focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game.
We propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge.
We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting.
 arXiv  Detail & Related papers  (2021-04-14T11:09:03Z)
- SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of
  Broadcast Soccer Videos [71.72665910128975]
 SoccerNet-v2 is a novel large-scale corpus of manual annotations for the SoccerNet video dataset.
We release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos.
We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection.
 arXiv  Detail & Related papers  (2020-11-26T16:10:16Z)
- Automatic Pass Annotation from Soccer VideoStreams Based on Object
  Detection and LSTM [6.87782863484826]
 PassNet is a method to recognize the most frequent events in soccer, i.e., passes, from video streams.
Our results show good results and significant improvement in the accuracy of pass detection.
PassNet is the first step towards an automated event annotation system.
 arXiv  Detail & Related papers  (2020-07-13T16:14:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.