Going for GOAL: A Resource for Grounded Football Commentaries
- URL: http://arxiv.org/abs/2211.04534v1
- Date: Tue, 8 Nov 2022 20:04:27 GMT
- Title: Going for GOAL: A Resource for Grounded Football Commentaries
- Authors: Alessandro Suglia, Jos\'e Lopes, Emanuele Bastianelli, Andrea Vanzo,
Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser
- Abstract summary: We present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or soccer') highlights videos with transcribed live commentaries in English.
We provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation.
Results show that SOTA models perform reasonably well in most tasks.
- Score: 66.10040637644697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent video+language datasets cover domains where the interaction is highly
structured, such as instructional videos, or where the interaction is scripted,
such as TV shows. Both of these properties can lead to spurious cues to be
exploited by models rather than learning to ground language. In this paper, we
present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or
`soccer') highlights videos with transcribed live commentaries in English. As
the course of a game is unpredictable, so are commentaries, which makes them a
unique resource to investigate dynamic language grounding. We also provide
state-of-the-art baselines for the following tasks: frame reordering, moment
retrieval, live commentary retrieval and play-by-play live commentary
generation. Results show that SOTA models perform reasonably well in most
tasks. We discuss the implications of these results and suggest new tasks for
which GOAL can be used. Our codebase is available at:
https://gitlab.com/grounded-sport-convai/goal-baselines.
Related papers
- MatchTime: Towards Automatic Soccer Game Commentary Generation [52.431010585268865]
We consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches.
Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale.
Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice.
arXiv Detail & Related papers (2024-06-26T17:57:25Z) - GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
Real-time Soccer Commentary Generation [75.60413443783953]
We present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC)
Our data and code are available at https://github.com/THU-KEG/goal.
arXiv Detail & Related papers (2023-03-26T08:43:36Z) - Commentary Generation from Data Records of Multiplayer Strategy Esports Game [21.133690853111133]
We build large-scale datasets that pair structured data and commentaries from a popular esports game, League of Legends.
We then evaluate Transformer-based models to generate game commentaries from structured data records.
We will release our dataset to boost potential research in the data-to-text generation community.
arXiv Detail & Related papers (2022-12-21T11:23:31Z) - GOAL: Towards Benchmarking Few-Shot Sports Game Summarization [0.3683202928838613]
We release GOAL, the first English sports game summarization dataset.
There are 103 commentary-news pairs in GOAL, where the average lengths of commentaries and news are 2724.9 and 476.3 words, respectively.
arXiv Detail & Related papers (2022-07-18T14:29:18Z) - Unsupervised Temporal Video Grounding with Deep Semantic Clustering [58.95918952149763]
Temporal video grounding aims to localize a target segment in a video according to a given sentence query.
In this paper, we explore whether a video grounding model can be learned without any paired annotations.
Considering there is no paired supervision, we propose a novel Deep Semantic Clustering Network (DSCNet) to leverage all semantic information from the whole query set.
arXiv Detail & Related papers (2022-01-14T05:16:33Z) - SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of
Broadcast Soccer Videos [71.72665910128975]
SoccerNet-v2 is a novel large-scale corpus of manual annotations for the SoccerNet video dataset.
We release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos.
We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection.
arXiv Detail & Related papers (2020-11-26T16:10:16Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.