KASportsFormer: Kinematic Anatomy Enhanced Transformer for 3D Human Pose Estimation on Short Sports Scene Video
- URL: http://arxiv.org/abs/2507.20763v1
- Date: Mon, 28 Jul 2025 12:17:40 GMT
- Title: KASportsFormer: Kinematic Anatomy Enhanced Transformer for 3D Human Pose Estimation on Short Sports Scene Video
- Authors: Zhuoer Yin, Calvin Yeung, Tomohiro Suzuki, Ryota Tanaka, Keisuke Fujii,
- Abstract summary: We introduce KASportsFormer, a novel transformer based 3D pose estimation framework for sports.<n>Our proposed method achieves state-of-the-art results with MPJPE errors of 58.0mm and 34.3mm, respectively.
- Score: 4.653030985708889
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent transformer based approaches have demonstrated impressive performance in solving real-world 3D human pose estimation problems. Albeit these approaches achieve fruitful results on benchmark datasets, they tend to fall short of sports scenarios where human movements are more complicated than daily life actions, as being hindered by motion blur, occlusions, and domain shifts. Moreover, due to the fact that critical motions in a sports game often finish in moments of time (e.g., shooting), the ability to focus on momentary actions is becoming a crucial factor in sports analysis, where current methods appear to struggle with instantaneous scenarios. To overcome these limitations, we introduce KASportsFormer, a novel transformer based 3D pose estimation framework for sports that incorporates a kinematic anatomy-informed feature representation and integration module. In which the inherent kinematic motion information is extracted with the Bone Extractor (BoneExt) and Limb Fuser (LimbFus) modules and encoded in a multimodal manner. This improved the capability of comprehending sports poses in short videos. We evaluate our method through two representative sports scene datasets: SportsPose and WorldPose. Experimental results show that our proposed method achieves state-of-the-art results with MPJPE errors of 58.0mm and 34.3mm, respectively. Our code and models are available at: https://github.com/jw0r1n/KASportsFormer
Related papers
- AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability [4.991985467382602]
We introduce the AthleticsPose dataset, featuring real'' motions captured from 23 athletes performing various athletics events on an athletic field.<n>Our results show that the model trained on AthleticsPose significantly outperforms a baseline model trained on an imitated sports motion dataset.<n>In case studies of kinematic indicators, the model demonstrated the potential to capture individual differences in knee angles but struggled with higher-speed metrics.
arXiv Detail & Related papers (2025-07-17T08:43:23Z) - Object-centric 3D Motion Field for Robot Learning from Human Videos [56.9436352861611]
We propose to use object-centric 3D motion field to represent actions for robot learning from human videos.<n>We present a novel framework for extracting this representation from videos for zero-shot control.<n> Experiments show that our method reduces 3D motion estimation error by over 50% compared to the latest method.
arXiv Detail & Related papers (2025-06-04T17:59:06Z) - Multi-person Physics-based Pose Estimation for Combat Sports [0.689728655482787]
We propose a novel framework for accurate 3D human pose estimation in combat sports using sparse multi-camera setups.<n>Our method integrates robust multi-view 2D pose tracking via a transformer-based top-down approach.<n>We further enhance pose realism and robustness by introducing a multi-person physics-based trajectory optimization step.
arXiv Detail & Related papers (2025-04-11T00:08:14Z) - AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements [4.653030985708889]
AthletePose3D is a novel dataset designed to capture high-speed, high-acceleration athletic movements.<n>We evaluate state-of-the-art (SOTA) monocular 2D and 3D pose estimation models on the dataset.
arXiv Detail & Related papers (2025-03-10T16:16:02Z) - A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions.<n>We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation.<n>Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z) - SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame
Interpolation [11.198172694893927]
SportsSloMo is a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution ($geq$720p) slow-motion sports videos crawled from YouTube.
We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets.
We introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection.
arXiv Detail & Related papers (2023-08-31T17:23:50Z) - Physics-based Motion Retargeting from Sparse Inputs [73.94570049637717]
Commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose.
We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies.
We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available.
arXiv Detail & Related papers (2023-07-04T21:57:05Z) - SportsPose -- A Dynamic 3D sports pose dataset [0.0]
SportsPose is a large-scale 3D human pose dataset consisting of highly dynamic sports movements.
SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements.
arXiv Detail & Related papers (2023-04-04T15:15:25Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Human Mesh Recovery from Multiple Shots [85.18244937708356]
We propose a framework for improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh.
We show that the resulting data is beneficial in the training of various human mesh recovery models.
The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media.
arXiv Detail & Related papers (2020-12-17T18:58:02Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.