Related papers: Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory

Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory

URL: http://arxiv.org/abs/2411.04501v1
Date: Thu, 07 Nov 2024 07:50:58 GMT
Title: Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory
Authors: Ali K. AlShami, Terrance Boult, Jugal Kalita,
Abstract summary: We propose Pose2Trajectory, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position. We use encoder-decoder Transformer architecture trained on the joints and trajectory information of the players with ball positions. We generate a high-quality dataset from multiple videos to assist tennis player movement prediction using object detection and human pose estimation methods.
Score: 6.349503549199403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tracking the trajectory of tennis players can help camera operators in production. Predicting future movement enables cameras to automatically track and predict a player's future trajectory without human intervention. Predicting future human movement in the context of complex physical tasks is also intellectually satisfying. Swift advancements in sports analytics and the wide availability of videos for tennis have inspired us to propose a novel method called Pose2Trajectory, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position. Demonstrating impressive accuracy, our approach capitalizes on body joint information to provide a comprehensive understanding of the human body's geometry and motion, thereby enhancing the prediction of the player's trajectory. We use encoder-decoder Transformer architecture trained on the joints and trajectory information of the players with ball positions. The predicted sequence can provide information to help close-up cameras to keep tracking the tennis player, following centroid coordinates. We generate a high-quality dataset from multiple videos to assist tennis player movement prediction using object detection and human pose estimation methods. It contains bounding boxes and joint information for tennis players and ball positions in singles tennis games. Our method shows promising results in predicting the tennis player's movement trajectory with different sequence prediction lengths using the joints and trajectory information with the ball position.

Related papers

Action Anticipation from SoccerNet Football Video Broadcasts [84.87912817065506]
We introduce the task of action anticipation for football broadcast videos. We predict future actions in unobserved future frames within a five- or ten-second anticipation window. Our work will enable applications in automated broadcasting, tactical analysis, and player decision-making.
arXiv Detail & Related papers (2025-04-16T12:24:33Z)
TT3D: Table Tennis 3D Reconstruction [11.84899291358663]
We propose a novel approach for reconstructing precise 3D ball trajectories from online table tennis match recordings. Our method leverages the underlying physics of the ball's motion to identify the bounce state that minimizes the reprojection error of the ball's flying trajectory. A key advantage of our approach is its ability to infer ball spin without relying on human pose estimation or racket tracking.
arXiv Detail & Related papers (2025-04-14T09:37:47Z)
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks [1.4249472316161877]
Soccer rely on two data sources: the player positions on the pitch and the sequences of events they perform. We propose atemporal action detection approach that combines visual and game state analytics via Graph Neural Networks trained end-to-end with state-of-the-art 3D CNNs.
arXiv Detail & Related papers (2025-02-21T13:41:38Z)
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together. Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds. To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z)
FootBots: A Transformer-based Architecture for Motion Prediction in Soccer [28.32714256545306]
FootBots is an encoder-decoder transformer-based architecture addressing motion prediction and conditioned motion prediction. FootBots captures temporal and social dynamics using set attention blocks and multi-attention block decoder. Empirical results on real soccer data demonstrate that FootBots outperforms baselines in motion prediction.
arXiv Detail & Related papers (2024-06-28T11:49:59Z)
Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past. We leverage the large-scale pretraining of image diffusion models which can handle multi-modality. We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z)
Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior. Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z)
Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM [18.884300680050316]
This paper proposes an inference framework of ball trajectory from player trajectories as a cost-efficient alternative to ball tracking. The experimental results show that our model provides natural and accurate trajectories as well as admissible player ball possession at the same time. We suggest several practical applications of our framework including missing trajectory imputation, semi-automated pass annotation, automated zoom-in for match broadcasting, and calculating possession-wise running performance metrics.
arXiv Detail & Related papers (2023-06-14T02:19:59Z)
Who You Play Affects How You Play: Predicting Sports Performance Using Graph Attention Networks With Temporal Convolution [29.478765505215538]
This study presents a novel deep learning method, called GATv2-GCN, for predicting player performance in sports. We use a graph attention network to capture the attention that each player pays to each other, allowing for more accurate modeling. We evaluate the performance of our model using real-world sports data, demonstrating its effectiveness in predicting player performance.
arXiv Detail & Related papers (2023-03-29T14:48:51Z)
Graph Neural Networks to Predict Sports Outcomes [0.0]
We introduce a sport-agnostic graph-based representation of game states. We then use our proposed graph representation as input to graph neural networks to predict sports outcomes.
arXiv Detail & Related papers (2022-07-28T14:45:02Z)
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We formulate two sets of action detection problems -- emphaction localization and emphaction recognition. The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z)
Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation [0.0]
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players has been collected. A temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes.
arXiv Detail & Related papers (2021-04-20T11:32:43Z)
Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors. We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points [92.91569287889203]
We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction. To better capture the moving objects in videos, we introduce dynamic points. We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
arXiv Detail & Related papers (2020-07-11T08:43:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.