Related papers: Extended OpenTT Games Dataset: A table tennis dataset for fine-grained shot type and point outcome

Extended OpenTT Games Dataset: A table tennis dataset for fine-grained shot type and point outcome

URL: http://arxiv.org/abs/2512.19327v1
Date: Mon, 22 Dec 2025 12:25:50 GMT
Title: Extended OpenTT Games Dataset: A table tennis dataset for fine-grained shot type and point outcome
Authors: Moamal Fadhil Abdul, Jonas Bruun Hubrechts, Thomas Martini Jørgensen, Emil Hovad,
Abstract summary: OpenTTGames is a set of recordings from the side of the table with official labels for bounces, when the ball is above the net, or hitting the net.<n>Our extension adds the types of stroke to the events and a per-player taxonomy so models can move beyond event spotting.<n>Our annotations are released under the same CC BY-NC-SA 4.0 license as OpenTTGames, allowing free non-commercial use, modification, and redistribution.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Automatically detecting and classifying strokes in table tennis video can streamline training workflows, enrich broadcast overlays, and enable fine-grained performance analytics. For this to be possible, annotated video data of table tennis is needed. We extend the public OpenTTGames dataset with highly detailed, frame-accurate shot type annotations (forehand, backhand with subtypes), player posture labels (body lean and leg stance), and rally outcome tags at point end. OpenTTGames is a set of recordings from the side of the table with official labels for bounces, when the ball is above the net, or hitting the net. The dataset already contains ball coordinates near events, which are either "bounce", "net", or "empty_event" in the original OpenTTGames dataset, and semantic masks (humans, table, scoreboard). Our extension adds the types of stroke to the events and a per-player taxonomy so models can move beyond event spotting toward tactical understanding (e.g., whether a stroke is likely to win the point or set up an advantage). We provide a compact coding scheme and code-assisted labeling procedure to support reproducible annotations and baselines for fine-grained stroke understanding in racket sports. This fills a practical gap in the community, where many prior video resources are either not publicly released or carry restrictive/unclear licenses that hinder reuse and benchmarking. Our annotations are released under the same CC BY-NC-SA 4.0 license as OpenTTGames, allowing free non-commercial use, modification, and redistribution, with appropriate attribution.

Related papers

A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage [0.0]
We present a reliable temporal grounding pipeline for video-to-analytic alignment of basketball broadcast footage. Our method aligns a pre-labeled corpus of play-by-play annotations containing dense event annotations to video frames, enabling quick retrieval of labeled video segments.
arXiv Detail & Related papers (2024-10-30T17:27:44Z)
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark [49.54265459763042]
We construct a multimodal basketball game knowledge graph (KG_NBA_2022) to provide additional knowledge beyond videos. Then, a dataset that contains 9 types of fine-grained shooting events and 286 players' knowledge is constructed based on KG_NBA_2022. We develop a knowledge guided entity-aware video captioning network (KEANet) based on a candidate player list in encoder-decoder form for basketball live text broadcast.
arXiv Detail & Related papers (2024-01-25T02:08:37Z)
UniVTG: Towards Unified Video-Language Temporal Grounding [52.56732639951834]
Video Temporal Grounding (VTG) aims to ground target clips from videos according to custom language queries. We propose to Unify the diverse VTG labels and tasks, dubbed UniVTG, along three directions. Thanks to the unified framework, we are able to unlock temporal grounding pretraining from large-scale diverse labels.
arXiv Detail & Related papers (2023-07-31T14:34:49Z)
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos [57.830865926459914]
We propose a vision-language learning framework for untrimmed videos, which automatically detects informative events. Instead of coarse-level video-language alignments, we present two dual pretext tasks to encourage fine-grained segment-level alignments. Our framework is easily to tasks covering visually-grounded language understanding and generation.
arXiv Detail & Related papers (2023-03-11T11:00:16Z)
Event Detection in Football using Graph Convolutional Networks [0.0]
We show how to model the players and the ball in each frame of the video sequence as a graph. We present the results for graph convolutional layers and pooling methods that can be used to model the temporal context present around each action.
arXiv Detail & Related papers (2023-01-24T14:52:54Z)
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We formulate two sets of action detection problems -- emphaction localization and emphaction recognition. The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z)
A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games [3.7111751305143654]
We present a universal taxonomy that covers a wide range of low and high-level events for invasion games. We release two multi-modal datasets comprising video and positional data with gold-standard annotations to foster research in fine-grained and ball-centered event spotting.
arXiv Detail & Related papers (2021-08-25T10:09:28Z)
Reducing the Annotation Effort for Video Object Segmentation Datasets [50.893073670389164]
densely labeling every frame with pixel masks does not scale to large datasets. We use a deep convolutional network to automatically create pseudo-labels on a pixel level from much cheaper bounding box annotations. We obtain the new TAO-VOS benchmark, which we make publicly available at www.vision.rwth-aachen.de/page/taovos.
arXiv Detail & Related papers (2020-11-02T17:34:45Z)
Labelling unlabelled videos from scratch with multi-modal self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders. We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations. An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z)
TTNet: Real-time temporal and spatial video analysis of table tennis [5.156484100374058]
We present a neural network aimed at real-time processing of high-resolution table tennis videos. This approach gives core information for reasoning score updates by an auto-referee system. We publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events.
arXiv Detail & Related papers (2020-04-21T11:57:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.