TTNet: Real-time temporal and spatial video analysis of table tennis
- URL: http://arxiv.org/abs/2004.09927v1
- Date: Tue, 21 Apr 2020 11:57:51 GMT
- Title: TTNet: Real-time temporal and spatial video analysis of table tennis
- Authors: Roman Voeikov, Nikolay Falaleev and Ruslan Baikulov
- Abstract summary: We present a neural network aimed at real-time processing of high-resolution table tennis videos.
This approach gives core information for reasoning score updates by an auto-referee system.
We publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events.
- Score: 5.156484100374058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a neural network TTNet aimed at real-time processing of
high-resolution table tennis videos, providing both temporal (events spotting)
and spatial (ball detection and semantic segmentation) data. This approach
gives core information for reasoning score updates by an auto-referee system.
We also publish a multi-task dataset OpenTTGames with videos of table tennis
games in 120 fps labeled with events, semantic segmentation masks, and ball
coordinates for evaluation of multi-task approaches, primarily oriented on
spotting of quick events and small objects tracking. TTNet demonstrated 97.0%
accuracy in game events spotting along with 2 pixels RMSE in ball detection
with 97.5% accuracy on the test part of the presented dataset.
The proposed network allows the processing of downscaled full HD videos with
inference time below 6 ms per input tensor on a machine with a single
consumer-grade GPU. Thus, we are contributing to the development of real-time
multi-task deep learning applications and presenting approach, which is
potentially capable of substituting manual data collection by sports scouts,
providing support for referees' decision-making, and gathering extra
information about the game process.
Related papers
- Perception Test: A Diagnostic Benchmark for Multimodal Video Models [78.64546291816117]
We propose a novel multimodal video benchmark to evaluate the perception and reasoning skills of pre-trained multimodal models.
The Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities.
The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime.
arXiv Detail & Related papers (2023-05-23T07:54:37Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - Table Tennis Stroke Detection and Recognition Using Ball Trajectory Data [5.735035463793008]
A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players.
Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a temporal heatmap based model, have been implemented on our dataset.
A mathematical approach developed to extract temporal boundaries of strokes using the ball trajectory data yielded a total of 2023 valid strokes.
The temporal convolutional network developed performed stroke recognition on completely unseen data with an accuracy of 87.155%.
arXiv Detail & Related papers (2023-02-19T19:13:24Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
Soccer Videos [62.686484228479095]
We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
arXiv Detail & Related papers (2022-04-14T12:22:12Z) - Table Tennis Stroke Recognition Using Two-Dimensional Human Pose
Estimation [0.0]
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification.
A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players has been collected.
A temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes.
arXiv Detail & Related papers (2021-04-20T11:32:43Z) - SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of
Broadcast Soccer Videos [71.72665910128975]
SoccerNet-v2 is a novel large-scale corpus of manual annotations for the SoccerNet video dataset.
We release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos.
We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection.
arXiv Detail & Related papers (2020-11-26T16:10:16Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.